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ABSTRACT 


HUGHES,  GEORGE  CRITTENDEN.  Convergence  Rate  Analysis  for  Iterative 
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of  JOSEPH  C.  DUNN.) 

A  large  class  of  descent  algorithms  is  analyzed  for  the  problem 

min^f,  with  0  a  convex  subset  of  a  Banach  space  X,  and  F:X  ->-]R^  a 

differentiable  functional.  At  each  iteration  a  feasible  direction 

x  -  x  is  determined,  where  x  is  a  solution  to  the  subproblem 
n  n  n 

min  {(  f * (x  ),  y  -  x  >  +  4<M  (y  -  x  ) ,  y  -  x  >}  with  {M  }  a  sequence 
___  n  n  d  n  n  n  n 

of  nonnegative  linear  operators  with  a  uniform  upper  bound,  and  step 
lengths  are  obtained  from  Goldstein's  rule.  If  f  is  Lipschitz  continuous 
and  0  is  bounded,  then  limit  points  of  sequences  generated  by  this  general 

scheme  are  extremals.  A  "worst  case"  convergence  rate  estimate  of 

—1/3  —1 

r  =  f(x  )  -  inf  f  =  0(n  )  for  convex  f  is  shown  to  improve  to  0(n  ) 

n.  n 

when  either  the  condition  numbers  of  the  operators  in  the  sequence  {Mn> 
are  bounded  away  from  zero  or  0  £ <  MflU,  u>  <_ <  f"(x)u,  u>,  Vx  e  ft, 

Vu  S  X,  Vn  _>  0;  under  these  conditions  a  hierarchy  or  rate  estimates 
exists  ranging  from  finite  termination  of  the  process  to  rn  =  0(n-1) 
depending  on  how  fast  f  grows  near  a  unique  minimizer  £,  i.e.,  depending 
on  the  value  of  v  in  either  of  the  conditions  <  f'(s)>  x  -  5>  >_  y (fx  -  5||v 
or  f(x)  -  f(c)  _>  y|Jx  -  s]|V,  Vx  £  fl,  some  y  >  0  and  v  €  [l,  °°).  A 
similar  hierarchy  of  rate  estimates  is  established  for  Newton's  method 
(Mfi  =  f"(xn))  also  depending  on  the  growth  of  the  convex  functional  f 
near  5. 

For  twice  differentiable,  possibly  nonconvex  functionals  f  local 
conditions  on  the  growth  of  the  quadratic  approximation  to  f  at  £  ir. 
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directions  leading  into  Q  are  given  as  sufficient  to  insure  linear  or 


super  linear  convergence  of  the  sequence  (||xn  -  £  ||}  when  the  iterates 
pass  sufficiently  near  £  and  the  operators  Mq  are  either  uniformly  positive 
definite  or  satisfy  certain  standard  quasi-Newton  conditions. 

These  results  have  potential  applications  to  problems  in  optimal 
control  theory. 
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1.  Introduction 


Let  f  be  a  real  functional  on  a  real  Banach  space  X,  i.e.,  f:X  -►3R1, 
and  consider  the  following  constrained  optimization  problem: 

(P)  min  f(x) 

where  Q  is  a  closed  convex  nonempty  subset  of  X.  A  number  of  methods  for 
solving  (P)  generate  sequences  of  approximations  to  the  solution  via  the 
following  general  process: 

(1.1a)  x  .  =  x  +  w  (x  -  x  )  ,  u  e  [o,  1]  , 
n+l  nnnn  n 

where 

(1.1b)  xq  €  arg  min  (^(y). 

y€fl 

^(y)  is  a  functional  which  approximates  f(y)  near  the  vector  xn>  and  u>n 
is  a  steplength  parameter.  Three  examples  of  methods  from  this  general 
class  are: 

A.  The  conditional  gradient  method  corresponding  to 

Si  *  (  f'(xr.),  y  -  V1 

here  f ' (x&)  is  the  Frechet  derivative  of  f  at  x^,  and  brackets,  <u,  v>, 
denote  the  value  of  an  element  u  e  X*,  the  dual  of  X,  operating  on  an 


element  v  €  X. 
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B.  The  "relaxed"  form  of  the  method  of  gradient  projection  corresponding 
to 

QjjCy)  *  <  f'(*n).  y  -  V  +  »  an  -  a  >  0  for  n  1  0- 

n 

For  this  class  of  methods,  X  is  understood  to  be  a  Hilbert  space. 

C.  The  relaxed  Newton's  method  corresponding  to 

Q^y)  =  <  f’(xn),  y  -  xn>  +  f"(xn)(y  -  \) ,  y  -  xn>. 

When  the  functionals  Q^y)  are  properly  chosen,  the  vector  xq  -  xq 
will  be  a  feasible  direction,  i.e.,  for  sufficiently  small  u  £  (0,  1], 

f ( x  +  w(x  -  x  ))  <  f(x  )  for  0)  €  (o,  w), 
n  n  n  n 

provided  xn  is  not  an  extremal  (see  (2.1)).  There  are  many  methods  for 

choosing  suitable  stepsize  parameters  u>n  which  will  insure  that 

f(x  , )  <  f(x  )  when  x  -  x  is  a  feasible  direction.  Most  attempt  to 
n+1  n  n  n 

approximate  the  classical  line  minimization  scheme  in  which  one  chooses 
the  smallest  satisfying 

(1.2)  min  f(x  +  gj(x  -  x  )). 

uje[o,i]  n  n  n 

For  most  nonlinear  problems,  however,  (1.2)  cannot  be  solved  exactly,  and 
methods  which  approximate  (1.2)  are  necessary. 

In  addition  to  determining  feasible  directions,  the  functionals  QQ(y) 
must  have  the  property  that  subproblem  (l.lb)  is  easy  to  solve  relative 
to  (P).  For  methods  such  as  the  conditional  gradient  method  or  the  method 
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of  gradient  projection  on  certain  simple  sets  such  as  those  often  found  in 
optimal  control  theory,  (1.1b}  is  trivial.  For  more  complicated  functionals 
Q^y)  and  constraint  sets  ft,  the  utility  of  such  methods  becomes  question¬ 
able.  A  number  of  authors  have  devised  variations  of  the  basic  scheme  to 
make  the  subproblem  (1.1b)  more  feasible.  Han  [l]  and  Garcia  Palomares 
and  Mangas ari an  [2]  minimize 

Vy>  =  <f,(xn)»  y  -  xn>  +  l<Mn(y  ’  xn}’  y  '  xn> 

over  an  approximation  to  ft  defined  by  linear  inequalities  in]Rn.  In  their 
method  (Mn>  is  a  sequence  of  operators  which  approaches  the  second  derivative 
operator  of  the  Lagrangian  of  f  and  the  constraints  defining  ft.  Bertsekas 
[3]  uses  a  hybrid  Newton  method  similar  to  gradient  projection  on  simple 
sets  such  as  orthants  and  cubes  in]Rn.  Such  modifications  can  have  consid¬ 
erable  practical  importance  in  special  cases;  however,  the  convergence 
behavior  of  the  basic  method  (l.l)  itself  is  still  only  partially  understood. 

The  purpose  of  this  thesis  is  to  establish  the  convergence  properties 
of  the  class  of  algorithms  (l.l)  in  which  Qn(y)  is  of  the  form 

(1.3)  Q^y)  =  <  f '  (xn) ,  y  -  xn>  +  |  <  MR(y  -  xr) ,  y  -  xn> 

where  each  is  a  nonnegative  bounded  linear  operator,  i.e.,  €  BL(X,  X*) 


(1.1*) 


0  <  <  M  u,  u>  ,  Vu  G  X, 

—  n  ’ 


1* 


Although  many  different  stepsize  rules  have  been  investigated  for  methods 
in  this  general  scheme  (GS),  the  essential  differences  in  the  algorithms 
lie  in  the  selection  of  the  operator  sequence  {Mn>  and  not  in  the  method 
of  choosing  the  stepsize.  In  fact,  a  number  of  papers  have  compared  major 
stepsize  rules  (see,  e.g.,  ( U ] ,  [5],  [6]),  and  the  basic  conclusion  is 
that  differences  in  convergence  rates  are  minimal.  In  the  analysis  to 
follow,  the  Goldstein  rule  described  in  Chapter  2  will  be  used  since  it  is 
prototypic  of  the  rules  for  approximating  line  minimization  (1.2).  There 
are  several  good  reasons  for  carrying  out  the  analysis  in  the  setting  of 
a  general  Banach  space;  in  particular,  by  retaining  the  maximum  degree  of 
flexibility  at  the  outset  it  is  possible  to  obtain  sharper  bounds  on 
convergence  rates  for  function  space  minimization  problems  later  on 
( see  Remarks  3.2,  4 . 2 ) . 

The  methods  in  the  (GS)  have  for  the  most  part  been  analyzed  quite 
thoroughly  for  convex  differentiable  functionals  f  with  "regular"  minimizers. 
However,  when  f  is  non-convex  or  when  singularities  exist  at  the  minimizers, 
the  analysis  has  been  less  thorough  and  in  some  cases  sketchy.  Recent 
work  has  focused  on  understanding  the  behavior  of  the  algorithms  under 
these  less  tractable  conditions.  The  following  brief  review  of  the  major 
results  on  convergence  and  rate  of  convergence  of  methods  embedded  in  the 
(GS)  will  put  into  perspective  the  results  of  this  thesis. 

In  Chapter  3  it  will  be  shown  that  when  X  is  a  Hilbert  space  and 

Mn  =  —  I,  where  I  is  the  identity  operator,  then  the  (GS)  is  the  same  as 

an 

the  method  of  gradient  projection  introduced  by  Goldstein  [7]  in  which 


(1.5) 


x  .  =  Pn(x  -  a  Vf(x  )). 
n+1  S2  n  n  n 
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Here  is  the  operation  of  projection  onto  Q  and  Vf(xn)  is  the  Hilbert 

space  representor  of  f'(xn)  in  X.  The  parameter  on  is  chosen  to  insure 

convergence  with  stepsize  parameter  u>  =  1  for  n  >  0.  Levitin  and  Poljak 

n  — 

[8]  first  gave  rate  of  convergence  results  for  this  method  for  convex  f 
using  the  "threshold"  rule 


d.6)  0 '  ei  ±  i e2  >  o. 

where  L  is  a  Lipschitz  constant  for  f * ,  i.e.. 


L  >_  sup  ■ 


|f'(x)  -  f ' (y) 

fFx^yll 


For  functionals  satisfying  the  uniform  convexity  condition 

(1.7)  JilMI2  £  (  f”(x)u,  u>  <_  IT  J|  u  Jj2 »  Vx6  a,  Vu6  X, 

and  0  <  u_  <_  p  <  00 , 

the  values  r  =  f(x  )  -  inf  f  converge  linearly,  i.e.,  r  =  0(Xn)  for  some 
n  n  n  n 

e  [0,  l).  In  the  absence  of  (1.7)  the  convergence  of  (r  )  for  convex 

functionals  will  be  at  least  like  0(— ).  Similar  results  were  obtained 

n 

by  Demyanov  and  Rubinov  [9]  who  investigated  four  variations  of  relaxed 
gradient  projection  in  which  the  sequence  (a^)  and  the  sequence  of  stepsize 
parameters  are  selected  by  combinations  of  threshold  rules  like  (1.6) 

and  line  minimization  (1.2).  Dunn  [10]  found  that  the  method  (1.5)  with 
the  sequence  (a  }  determined  by  a  Goldstein-like  rule  converges  linearly 
if  the  functional  grows  near  an  optimal  point,  or  extremal,  S  e  n,  like 
the  square  of  the  distance  from  5,  i.e.. 
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(1.8)  f(x)  -  f(c)  >  y||x  -  ell2 ,  vx  e  n,  y  >  o. 

This  occurs  when  (1.7)  is  satisfied  at  x  =  ?  or  when  the  structure  of  the 
set  near  the  extremal  is  such  that 

(1.9)  <  f 1  (C) ,  x  -  O  1  y||x  -  £|[2,  VxSfl,  y  >  0, 

since  in  the  case  of  convex  functionals  (1.9)  implies  (1.8).  In  fact, 

Dunn  was  able  to  show  for  a  wider  class  of  functionals  which  are  pseudo- 
convex  in  the  sense  of  Mangasarian  [ll],  that  a  complete  hierarchy  of 
rates  can  be  determined  from  the  condition 

(1.10)  f(x)  -  f(5)  1  y||x  -  Cif,  Vx  S  Q,  v  6  [l,  -),  t  >  0, 

ranging  from  finite  termination  of  the  process  (i.e.,  Xjj  =  £  for  some 
N  >_0)  when  v  =  1  to  rates  approaching  the  "worst  case"  rate  of  O(^)  as 
v  assumes  larger  values. 

The  conditional  gradient  method  [8],  [9],  [12]  results  when  the 

operators  in  the  (GS)  are  the  zero  operator  for  n  0.  With  steplength 

rules  of  the  line  minimization  type,  this  algorithm  was  shown  in  [8]  and 

[9]  to  converge  at  the  rate  r  =  0(— )  for  convex  functionals  with  Lipschitz 

n  n 

continuous  Frechet  derivatives  on  convex  closed  bounded  sets.  In  these 
investigations,  however,  it  could  not  be  shown  that  conditions  of  the 
sort  (1.7)  had  any  effect  on  the  convergence  rate  of  the  conditional 
gradient  method  (c.f.  gradient  projection  method);  a  linear  convergence 
rate  was  established  only  under  certain  strong  uniform  convexity 
conditions  on  the  set  Q  when  f'(C)  ji  0.  In  fact,  an  example  of  Canon  and 


Cullum  [13]  shows  that  even  when  (1.7)  is  satisfied  a  rate  of  0(— )  can 

n 

not  be  improved  upon  without  imposing  conditions  on  the  set  fl.  Dunn  [U] 
proved  that  uniform  convexity  of  Q  is  actually  a  very  strong  sufficient 
condition  for  linear  convergence  of  the  sequence  (rn)  and  that  the  weaker 
condition  (1.9)  will  suffice.  Dunn  [ lU ]  has  also  shown  that,  as  in  [10], 
a  hierarchy  of  convergence  rate  upper  bounds  exists  for  the  conditional 
gradient  method  depending  on  the  value  of  the  parameter  v  in  the  condition 

(1.11)  <f'(5),  x  -  5>  >  y|tx  -  5|f\  Vx  €  0,  v  6  [l,  -),  y  >  0. 

Conditions  of  this  type  sure  satisfied  in  various  Banach  spaces  by 
"bang-bang"  optimal  controls  [4],  [  17 ]  (see  Remarks  3.2,  4.2) . 

Allwright  [15]  and  Barnes  [l6]  both  considered  variations  of  the  (GS) 
in  which  specific  operator  sequences  (Mn }  are  used  in  certain  optimal 
control  settings.  Allwright  specified  operators  which  have  the  property 

(1.12)  0  <  (  M^u,  u><^<  f"(x)u,  u>,  Vx  £  !!,  Vu  €  X,  Vn  >  0. 

Although  he  was  able  to  prove  convergence  using  a  stepsize  rule  similar 
to  Goldstein’s  on  bounded  sets  with  convex  functionals,  he  established  a 
linear  convergence  rate  for  the  sequence  (rn)  only  when  {M^}  satisfies 

(1.13)  bIMI2  <  M^u,  u> ,  Vu  €  X,  Vn  >_  0,  u  >  0. 

which  with  (1.12)  implies  (1.7).  Barnes  also  required  condition  (1.7) 

with  operators  satisfying  (1.13)  to  achieve  linear  rates  for  {r  }. 

n 

If  f  is  convex  the  operator  f"  is  certainly  nonnegative  on  ft  and 
the  Newton  methods  treated  by  Kantorovich  [l8],  Goldstein  [19],  and 
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Levitin  and  Poljak  [8]  are  formally  in  the  (GS)  with  *  f"(xn)  for 
n  >_  0.  Very  little  has  been  written  alsout  convergence  rates  for  Newton's 
method  in  the  absence  of  the  regularity  condition  (1.7)  or  when  the  second 
derivative  operator  is  not  at  least  positive  definite  at  the  extremal. 
Levitin  and  Poljak  [8]  who  first  proposed  the  constrained  version  of 
Newton's  method  with  *  xa«  relied  on  condition  (1.7)  to  prove  super- 

linear  convergence  of  the  sequence  (||xa  -  C||)  to  zero.  Danilin  [20] 
gave  a  proof  of  convergence  of  the  method  for  convex  functionals  on 
bounded  sets  with  a  stepsize  rule* similar  to  Goldstein's  but,  once  again, 
required  condition  (1.7)  for  rates.  It  was  stated  by  Bulavskii  [2l]  for 
finite  dimensional  spaces  that^ condition  (1.7)  can  be  relaxed  to  a  condition 
on  the  growth  of  the  second ^rder  approximation  to  f  at  the  extremal  £, 
namely  . 

A 

(l.lh)  <  f'U),  x  -  o  +  f-<  f"(e)(x  -  5),  X  -  e>  i  y)|x  -  ell2  , 

0  V  x  G  ft,  Y  >  0. 

For  convex  functionals  tbi-  condition  insures  superlinear  convergence  of 
the  sequence  (||xn  -  c||}.  Dunn  [22]  independently  formulated  and  proved 
the  same  result  in  general  Banach  spaces  and  showed  that  when  (l.lU)  holds 
with  the  exponent  2  replaced  by  1,  then  finite  termination  of  the  process 
occurs. 

The  results  mentioned  so  far  have  been  restricted  to  convex  or 
pseudoconvex  functionals.  Although  a  number  of  articles  have  given 
convergence  results  for  these  methods  for  general  non-convex  functionals 
(e.g.  [23],  [5],  [2b])  there  are  very  few  convergence  rate  results.  For 


projected  gradient  methods  Goldstein  [23]  proved  that  positive  definiteness 
of  the  second  derivative  operator  at  a  local  minimizer  C  is  sufficient  to 


give  a  linear  rate  of  convergence  of  the  sequence  {||xn  -  c||)  if  xq  -*■  C; 
however,  for  constrained  minimization  problems,  this  condition  is  rather 
strong.  It  was  shown  by  Bertsekas  [2^]  that  the  second  derivative  operator 
does  not  even  have  to  be  nonnegative  at  an  extremal  to  achieve  linear 
convergence  in  projected  gradient  schemes.  For  certain  simple  sets  such 
as  orthants  and  cubes  inIRn,  Bertsekas  proved  that  if  the  first  derivative 
at  an  extremal  £  is  positive  in  coordinate  directions  leading  into  the 
set  and  the  second  derivative  at  C  is  positive  definite  in  the  subspace 
parallel  to  the  manifold  of  active  constraints,  then  iterates  generated  by 
the  gradient  projection  method  and  passing  sufficiently  near  the  extremal 
will  converge  to  the  extremal  at  a  linear  rate.  Similar  conditions  are 
given  by  Han  [l]  and  Garcia  Palomares  and  Mangasarian  [2]  for  their 
quasi-Newton  methods  to  achieve  linear  and  superlinear  rates  of  convergence 
for  sequences  coming  close  enough  to  extremals.  Their  methods  are  modi¬ 
fications  of  the  (GS)  as  indicated  earlier  and  are  in  fact  included  in 
the  (GS)  when  fi  is  defined  by  linear  inequalities  in]Rn. 

In  Chapter  2  of  the  present  thesis  it  will  be  shown  that  no  matter 
how  the  sequence  (Mn)  is  chosen,  as  long  as  the  operators  are  nonnegative 
and  uniformly  bounded  above,  every  limit  point  of  the  generated  sequence 
will  be  an  extremal,  and  if  f  is  convex  the  rate  of  convergence  of  {r^} 
will  be  rn  =  0(n“^^)  at  least. 

A  number  of  results  will  be  established  in  Chapter  3  for  the  (GS) 
when  {M  }  satisfies  either  condition  (1.12)  or  a  condition  requiring  a 
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uniform  lower  bound  on  the  "condition  numbers"  of  the  operators.  Note 

that  in  gradient  projection  methods,  the  operators  —I  have  condition 

an 

numbers  equal  to  1.  The  "worst  case"  rate  of  convergence  for  this  subclass 

will  be  O(i-)  for  the  sequence  {rn}  when  f  is  convex.  This  extends  the 

results  reported  by  Demyanov  and  Rubinov  [9],  who  considered  only  bounded 

sequences  {an>  for  the  relaxed  gradient  projection  method.  Their  rate  of 

r  =  0(i)  holds  for  any  sequence  {a  }  which  is  bounded  below  and  for 
n  n  n 

stepsizes  determined  by  Goldstein's  rule.  A  hierarchy  of  convergence  rate 
upper  bounds  will  be  established  for  this  subclass,  as  was  done  in  [10]  for 
the  gradient  projection  method  and  [ lU J  for  the  conditional  gradient 
method.  When  u>n  is  bounded  away  from  zero  the  higher  rates  of  convergence 
depend  on  the  growth  rate  of  f  near  £  (see  ( 1 . 10 ) ) .  On  the  other  hand, 
if  u>n  can  be  arbitrarily  small,  then  higher  rates  will  depend  on  how 
slowly  ion  decreases,  which,  in  turn,  can  be  estimated  in  the  presence  of 
conditions  on  the  structure  of  the  set  near  the  extremal,  i.e.,  condition 
(1.11). 

As  indicated  previously,  results  for  Newton's  method  have  been 
superlinear  rate  estimates  or  better  for  the  sequence  {||*n  -  Cll)  under 
regularity  conditions  like  (1.7)  or  (l.lU).  In  Chapter  h  it  is  shown 
that  a  hierarchy  of  rates  for  the  sequence  (r^)  exists  here  for  non-regular 
extremals  when  condition  (l.ll)  holds  with  v  in  the  range  1  ^  v  <_  5. 
Although  somewhat  incomplete  these  results  corroborate  the  belief  that 
even  in  nonregular  cases,  Newton's  method  outperforms  the  first  order 
methods.  These  ideas  are  developed  further  in  an  example  from  optimal 


control  theory. 
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In  Chapter  5  the  results  in  [l],  [2],  [23],  [2k]  for  non-convex 

functionals  will  be  extended  to  the  (GS)  in  Banach  spaces.  When  {M  > 

n 

satisfies  (1.13)  or  when  (1.9)  is  satisfied  at  £  and  the  second  order 
approximation  to  f  at  £  satisfies 

(1.17)  <  f’(£),  x  -  £>  +  |<f"(£)(x  -  £),  x  -  £>  >  Y|[x  -  £|f\ 

for  x  €  K^(£)  n  B^(£)  for  some  p  >  0,  where  K^(£)  is  the  tangent  cone 

to  n  at  £  with  vertex  at  £,  i.e.,  Kfi(£)  =  {x  S  X:£  +  t(x  -£)€$}  for 

some  t  >0),  and  B^(£)  is  a  closed  ball  of  radius  p  around  £,  then  if  the 

sequence  of  iterates  comes  sufficiently  near  £,  it  will  converge  to  £ 

and  f(xn)  -  f(5)  =  0(xn)  for  \  e  (o,  l).  Condition  (1.17)  need  hold  only 

for  x  €  S2  n  B  (£)  if  M  is  symmetric  as  well  as  nonnerative  and  M 
p  n  n 

approximates  f"(£)  in  one  of  the  following  four  ways:  either 

(1.18)  ||Mn  -  f"(£)||  <  e. 


for  e  sufficiently  small  and  n  >_  H  >  0,  or 


(1.19) 


(Mn  -  f"(£))(x  -  £ ) | 

— mrrtn — 


for  e  sufficiently  small  and  for  x  €  ft  and  n  >  !J  >  o,  or 


(1.20)  ||  Mn  -  f"(£)  ||  ■>  0  , 


as  n  +  ®, 


(1.21) 


I!  (Mn  -  f"(£))(x  -  £ )  | 

- rnrrm - 


for  x  G  n  as  n  -*■  ®. 


For  sequences  in  which  xQ  comes  close  enough  to  5  for  n  >_  N,  xq 
will  converge  to  5  at  a  linear  rate  of  convergence  when  either  (l.l8)  or 
(1.19)  is  satisfied  by  {Mq},  or  at  a  superlinear  rate  of  convergence  when 
either  (1.20)  or  (l.2l)  holds.  Conditions  (1.18)  -  (1.21)  and  symmetry 
of  the  operators  in  (Mn>  are  typical  conditions  placed  on  quasi-Newton 
operators  in  the  literature  (e.g.,  [25],  [2],  [26]). 

It  will  be  assumed  in  what  follows  that  at  each  step  of  the  (GS)  at 
least  one  solution  to  (l.lb)  exists.  The  existence  question  for  (1.1b) 
can  and  should  be  separated  from  the  convergence  rate  analysis  (e.g. , 
topologies  suitable  for  treating  the  former  may  be  inappropriate  for  the 
latter) .  In  any  case  the  emphasis  here  is  on  convergence  and  rate  of 
convergence  properties  of  sequences  in  the  (GS),  on  the  assumption  that 
such  sequences  exist. 


I 
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If  f  is  pseudoconvex  (and,  in  particular,  convex)  then  (2.1)  is  also  a 
sufficient  condition  [11].  A  differentiable  functional  f  is  pseudoconvex 
if  and  only  if  <  f'(x),  y  -  x>  >  0  whenever  f(x)  >  f(y)  and  x,  y  €  Q. 

Most  of  the  results  to  follow  for  convex  functionals  can  be  extended  to 
a  broad  subclass  of  pseudoconvex  functionals  discussed  in  Remark  2.4. 

The  general  scheme  outlined  below  is  designed  to  construct  a  sequence 
{xQ}  whose  limit  points  are  extremals  of  f  on  ft. 

Let  xQ  be  the  n—  approximation  to  the  solution  of  (P)  generated 
by  the  (GS).  Recall  that  in  this  scheme,  the  vector  x^  is  determined 
by 


(2.2)  xn  €  arg  min  Q(Mn,  xq,  y), 

where  the  functional  Q(Mq,  x,  y)  is  defined  by 

(2.3)  Q(Mn,  xn,  y)  =  <f’(xn),  y  -  xn>  +  |<Mn(y  -  xr),  y  -  xq>  . 

It  will  also  be  required  that  the  sequence  of  nonnegative  bounded  linear 
operators  {M^}  be  uniformly  bounded  above,  i.e.. 


(2.4)  ||Mn||<K 


Vn  >  0,  and  K  <  «®. 
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since  if  g(xQ,  xQ,  l)  <  6  then  (2.5)  is  satisfied  for  u  €  [a,  b]  C  (o,  l) 
for  some  a  #  b.  Normally  some  sort  of  bisection  procedure  is  used  to 
locate  an  element  of  such  an  interval. 

The  following  two  lemmas  are  fundamental  in  what  follows: 


Lemma  2.1.  Suppose  that  the  sequences  {r^}  C  [o,  ®)  and  {q^  C  [o,  ») 
satisfy 


(2.6) 


r  <  r  -  qr  , 
n+1  —  n  a  n 


Vn  >_  0 , 


for  k  a  fixed  exponent  in  the  range  (1,  »).  If 


qniq>  o» 

then 

(2.7)  rQ  =  0(n’1/(k’l)). 

Proof.  See  [10],  Lemma  U.l  for  the  proof. 

Lemma  2.2.  Let  f  be  Frechet  differentiable.  Let  M:X  X*  be  a  nonneg¬ 
ative  operator  and  a  convex  subset  of  a  Banach  space  X.  For  any  x  £ 
let  x  €  fi  satisfy 

(2.8)  x  e  arg  min  Q(M,  x,  y) . 

y€fi 

Let 

(2.9)  4>(x)  =  {z  €  ft:<  f  *  ( x ) ,  x  -  z>  >_  0}  . 


Then  for  any  z  £  *(x) 


(2.10)  <f*(x),x-x>> 


<  f  *  (x)  ,x-z  >+  ^-<  M(x-x)  ,x-x>,if  <  M(x-z,x-z>«0 


|min{<f’(x),x-z>,  1<  M(x-x)  ,x-x  > , 


if  <  M(x-z) ,x-z>>0  . 

Proof .  For  any  z  €  $(x)  and  any  0  €  [0,  l]  the  convex  combination 

z.  =  x  +  9(z  -  x)  is  also  in  ft  since  ft  is  convex.  From  (2.8)  it  follows 

O 

that  for  0  £  [0,  l] 

0  <_  <  f'(x),  zQ  -  x>  +  j<M(zq  -  x),  z0  -  x>  -  <  f’(x),  x  -  x> 

-  i <  M(x  -  x) ,  x  -  x> 
or 

<f'(x),  x  -  x>  ^<ff(x),  x  -  z0>-^-<M(z0  -  x),  z0  -  x> 

+  ^ <  M(x  -  x) ,  x  -  x>  . 

By  the  linearity  of  f'  and  M  one  can  write 

2 

(2.11)  <  f'(x) ,  x  -  x>  >_  0<  f'(x),  x  -  z>  — 2~<M(z  -  x),  z  -  x> 

+  ^ <  M(x  -  x),  x  -  x>  . 

The  sharpest  bound  is  obtained  by  maximizing  the  right  side  of  (2.11)  over 
0£  [0,  l].  If  <M(x  -  z),  x-  z)  =  0,  then  letting  0=1  yields 
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(2.12)  <  f'(x),  x  -  x>  >_<  f*(x),  x  -  z>  +  |-<M(x  -  x),  x  -  x>. 


If<M(x  -  z),  X  -  z>  >  0,  then 


1  A 


P(e)  =  0<f'(x ),  x  -  z>  -  i  0  <M(x  -  z),  x  -  z> 


is  a  quadratic  polynomial  with  maximum  value  at 

<*■«> 

If  0  <  1,  then  from  (2.11)  with  6=0 

(2.11,)  <f(x).  x  -  i>  >a^K):V:gz)  *?<"<*  -  *>■  *  -  *>• 

If  0  >_1,  then  it  follows  from  (2.13)  that 

<f'(x),  x  -  z>  >_  <  M(x  -  z),  x  -  z>, 
and  setting  0  =  1  in  (2.11)  yields 

(2.15)  <f’(x),  x  -  x>  >_<  f'(x),  x  -  z>  -  |<  f’(x),  x  -  z> 

+  ^  <  M(x  -  x) ,  x  -  x>  *  i  <  f  (x) ,  x  -  z>  +  ^-<M(x  -  x),  x  -  x>. 


The  lower  bound  (2.10)  follows  from  (2.12),  (2.11+) ,  and  (2.15). 


QED 


Remark  2.1.  Lemma  2.2  shows  that  if  at  any  step  of  the  (GS)  it  is 

determined  that  <  f'(x  ),  x  -  x  >  =  0  then  x^  is  an  extremal.  Suppose 

n  n  n  n 

it  were  not  an  extremal.  Then  for  some  y£  !!,  (  f'(xn),  y  -  xn>  <  0. 
But  that  would  mean  that  y  €  $(xn)  and  from  Lemma  2.2,  that 
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<  f'(x  ),  x  -  x  >  >  k  < f ' (x  ) ,  x  -  y>  for  some  k  >  0.  Therefore, 

n  *  n  n  —  n  n  J 

<f'(x  ),  x  -  x  >  >  0  and  this  contradiction  shows  that 
n  *  n  n 

<f'(xn),  y  -  xQ>  >_  0,  V y  €  n,  whenever  < f (x^) ,  xq  -  xR>  =  0.  Also, 
for  any  x  e  8,  x  €  $(x),  and  letting  z  *  x  in  (2.11)  it  follows  that 

(2.16)  <f’(x),x-x>^^-<M(x-x),x-x>>^0,  Vx  €  8. 

If  xq  is  an  extremal  then  by  (2.1) 

(2.17)  <f'(xn),  xn  -  xn>  >_  0. 

From  (2.16)  and  (2.17)  one  can  conclude  that  if  xn  is  an  extremal,  then 

<  f'(xn),  xn  -  x>  =  0.  Note  also  that  if  at  any  step  of  the  process  it 
is  determined  that 

(2.18)  xn  €  arg  min  Q(Mn,  xq,  y), 

y€8 

then  xfl  is  an  extremal,  since  one  can  choose  xn  =  xn,  and  that  would  make 

<f'(x  ),  x  -  x  >  =  0.  If  x  is  an  extremal,  then  (2.18)  holds  since,  if 
n  *  n  n  n 

not,  there  exists  a  y  €  8  such  that 

0  <  Q(Mq,  xn,  xn)  -  Q(Mn,  xn,  y), 

but  since  Q(M  ,  x  ,  x  )  =  0,  it  follows  from  the  definition  (2.3)  of 
n  n  n 

Q(Mfl,  xn,  y)  that 

<  f’(xn),  y  -  xn>  <  -  f- < Mn(y  -  xn),  y  -  xn>  i  o, 

which  is  a  contradiction  (see  (2.1)).  Summarizing  the  above  remarks, 

one  has  that  x  is  an  extremal  if  and  only  if  <  f’(x  ),  x  -  x  >  =  0  if 
n  n  n  n 

and  only  if  xn  satisfies  (2.18). 


Remark  2.2.  Define  ftf  as  the  set  of  minimizers  of  f  on  Q.  If  f  is 
convex,  then 

(2.19)  flf  C  *(x) ,  Vx  6  ft. 

This  follows  from  the  fact  that  for  convex  f,  if  £  6  ,  then 

(2.20)  <  f’(x),  x  -  O  >_  f ( x )  -  f(£)  >_  0  Vx  e  ft. 

If  f  is  pseudoconvex ,  then  it  follows  from  the  definition  of  pseudocon¬ 
vexity  that 

(2.21)  0  C  *(x)  for  x  e  o  .  nf. 

For  functionals  f  on  convex  bounded  sets  ft  the  (GS)  will  produce 
sequences  whose  limit  points  are  extremals;  this  is  shown  in  the 
following  theorem. 

Theorem  2.1.  Let  ft  C  x  where  X  is  a  Banach  space  and  ft  is  convex  and 

bounded.  Let  f  be  (Frechet)  differentiable  and  let  f  be  Lipschitz 

continous  with  Lipschitz  constant  L,  i.e.,  there  exists  an  L  >  0  such 

that  ||f'(x)  -  f ' (y)  |i  <_  L||x  -  y||,  Vx,  y  €  ft.  Then  f  is  bounded  below, 

{f(x  )}  is  nonincreasing  and  converges  to  some  limit  i  >.  inf  f  >  -»  ,  and 

ft 

every  limit  point  of  a  sequence  (xn)  generated  by  the  (GS)  is  an  extremal. 

If  f  is  also  convex,  then  the  values  r  =  f(x  )  -  inf  f  decrease 

n  n  ft 

—1/3 

monotonically  to  zero  at  least  at  the  rate  r  =  0(n”  ),  and  limit 

n 

points  are  minimizers. 
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Proof.  Using  Taylor's  formula  and  the  Lipschitz  continuity  of  f  one 
can  write 


(2.22)  f  ( x) -f  ( x  +o)  ( x  -x  ) ) = [  <  f’(x  +d(x+w  (x  -x  )-x )),u>  (x  -x 

n  nnnnjg  n  nnnnn  nn 

-ui  (  f  '(x  )  ,x  -x  >+u)  f  <  f '  (x  +6u  (x  -x  ) )-f ' (x  )  ,x  -x  >dd 
n  n  n  n  n/_  n  nnn  nnn 


From  Goldstein's  rule  (2.5),  if  u  <1  then 

n 


f(x  )  -  f(x  . ) 

1  .  6  >  - S - n+l__  ^ 

til  <  f '  (x  )  ,  X  -  x  > 
n  n  '  n  n 


and  with  (2.22)  there  results 


1-  6  >  1  - 


--■K  -  *■>» 

zruj,  xn  -  in> 


which  gives 


(2.23) 


<jj  >  min{l, 
n  — 


26  <  f'(x  ),  x  - 
n  n 

Lj|x  -  x  \f 
11  n  n'1 


x  > 


Let  D  =  diameter  Q  =  sup  |(x  -  y([.  Then 

x,y€Q 

{2.2h)  ui  >  min{l,  -~-<f'(x  ),  x  -  x  >}. 

n  —  T_cr  nnn 


)  >d9 
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Also  from  Goldstein's  rule  (2.5)  one  has 


(2.25)  f ( x  )  -  f(x  ..)  >  6w  <  f ' (x  ) ,  x  -  x  > 
n  n+l  —  n  n  n  n 

>  6  min{<f'(x  ),  x  -  x  > ,  *^%<f'(x  ),  x  -  x  >2}  , 
nnn’^2  nnn» 

and,  therefore,  f(x  )  -  f(x  )  >  0  since  <  f'(x  ),  x  -  x  >  >  0  for 
n  n+l  —  n  n  n  — 

n  ^  0.  It  follows  easily  from  the  Lipschitz  continuity  of  f  and  the 
boundedness  of  Ji  that  inf  f  >  and  so 

a 


lim(f(xn)  -  f(xn+1))  =  0, 
n-*»  J 


which,  in  turn,  implies  that 


lim  <  f'(xn) , 

n-x» 


0. 


Thus, 

then 


if  C  is  a  limit  point  and  (x  }  is  a  subsequence  converging  to  £, 

nk 


(2.26)  lim<f'(x  ),  x  -  x  >  =  0. 
n-x»  Hjs  nk 

Suppose  that  £  is  not  an  extremal,  that  is  ,  for  some  z  €  ft 
<  f'(5),  z  —  C>  =  -a  <  0. 


Then  by  the  continuity  of  f'  and  the  fact  that  xn  +  £  it  follows  that 


lim  <  f ' (x  ) ,  z 
n-*»  ^ 


x  >  =  -a, 
nk 


and,  therefore,  there  exists  an  N  >  0  such  that  for  n,  >  N 

k  — 


<f'(x  ),  z 

nk 


n. 


< 


-a 

2 
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or 


<  f*(x  ),  x  -  z>  >  0 . 

\  \  2 

This  implies  that  z  £  $(x  )  for  n.  >_  N.  Therefore,  by  Lemma  2.2, 

nk  K 

assuming  <M  (x  -z),x  -z>>0,  one  obtains 

\  \  “k 

<f’(x  ).X  -  z>2 

(2.27)  <f(x  ),  x  -i  >>-i«ln((f(x  ).x  -  *>,<„  (X- li>) 
k  x  k  kk  nnknk 


<  f'(x  ),  xfl  -  z> 

>  \  min{<  f ’ ( x  ) ,  x  -  z>,  - 5-^ - } 

~  2  nk  nk  KD2 


>  \  min  {  f-,  >  0  . 

“  2  2  J+KD2 


If  ( M  (x  -z),x  -  z  >  =  0  ,  then 

nk  nk  “k 


(2.28)  <  f * ( x  ) ,  x  -  x>  >  -  >  0  . 

nk  \  nk  “2 

But  (2.27)  and  (2.28)  contradict  (2.26)  and  it  follows  that 


<  f'U),  z  -  s>  _>  0  ,  Vz  £  a. 

Let  f  be  convex,  then  since  any  extremal  is  a  minimizer  ,  one  can  conclude 

that  any  limit  point  £  £  n  .  For  every  n  >_  0  let  £  3  be  such  that 

f(x  )  ~  f ( z  )  >  ~(f(x  )  —  inf  f )  =  r  .  Then  z  £  $>(x  )  since  by  the 
n  n  —  d  n  ^  d  n  n  n 

convexity  of  f  one  can  write 

(2.29)  <  f'(xn),  xn  -  zn>  >  f(xn)  -  f(zn)  >|  rn  >_  0. 


Therefore,  by  Lemma  2.2, 


(2.30)  <f'(xn),  xn  -  xn>  >_  j  min{<  t'(xn) ,  xn  -  zn>. 


<f(xn),  x„ 


n 


n 


n  n  n 
2 

1  r  r 
i  P  min{-i,  — ap-}  >_  0  , 

2  2  to2 


z  >2 


KD 


and  (2.25)  yields 


(2.31) 


f(x  )  -  inf  f  -  f(x  , )  +  inf  f 
n  „  n+1  _ 

ft  ft 

2  ,  2  .  k 

r  r  5r  6r 
I.  r  n  ^  ^  n  ■» 

1  5  min{-jp  - 2 »  - ?  *  - 

8KD  8LD  32LK  D 


Since  lim  <  f'(x  ),  x  -  x  >  =  0,  it  follows  from  (2.30)  that  r  -*•  0  ,  and 
n  n  n  n 

n-x» 

from  (2.31)  with  r  sufficiently  small  one  has 
n 

Tn*l-rn-vn'  for<l>0' 

—1/3 

and  the  rate  r  =  0{n  J)  follows  from  Lemma  2.1. 


QED 


Remark  2,3.  In  proving  that  every  limit  point  is  an  extremal,  the 
crucial  fact  is  that 

lim  (  f ' (x  ) ,  x  -  x  >  =  0 . 
n  n  n 

nvoo 

As  shown  in  the  theorem,  this  will  be  true  for  any  operator  sequence 

{M^}  in  the  (GS)  provided  that  ft  is  bounded.  The  condition  of  boundedness 

can  be  removed  if  it  can  be  established  that  u>n  >_  u>  >  0,  for  all  n  >_  0 , 

and  inf  f  >  -<*>.  In  the  next  chapter  it  will  be  shown  that  the  stepsize 
ft 


T 


1 

2k 

parameters  are  bounded  away  from  zero  when  the  operator  sequence  {Mq} 

satisfies  (1.13)  and  for  certain  other  methods  in  the  (GS).  In  these 

cases  the  theorem  is  true  for  ft  *  X  provided  inf  f  >  -<*>. 

X 

Remark  2.4.  It  is  easy  to  confirm  from  the  proof  of  Theorem  2.1  and 

—1/3 

Remark  2.2  that  the  convergence  rate  of  0(n  )  for  convex  functionals 

■j 

can  be  extended  to  pseudoconvex  functionals  which  satisfy 

(2.32)  <f(x),  x  -  c>  >  K(f(x)  -  f(c))  Vx  €  n  -  nf,  n  e  nf,  k  >  o. 

In  [10 ],  Dunn  establishes  (2.3 2)  for  a  large  subclass  of  pseudoconvex 

j 

functionals  which  includes  certain  concave  functionals. 


3.  Convergence  Rates  for  Convex  Functionals 

In  the  previous  chapter  it  was  shown  that  for  any  sequence  of 
nonnegative  operators,  bounded  uniformly  above,  the  '’worst  case" 

convergence  rate  of  sequences  {f(x  )  -  inf  f)  generated  by  the  (GS)  is 

—1/3  ® 

0(n  )  for  convex  f.  In  [8],  [9]  and  other  references,  however,  a 

"worst  case"  rate  of  r  =  0(— )  is  established  for  the  conditional 

n  n 

gradient  method  and  the  gradient  projection  method.  In  Theorem  3.1  it 

is  shown  that  the  rate  r  =  0(— )  holds  for  a  large  class  of  methods  in 

n  n 

the  (GS)  whose  operator  sequences  satisfy  either  of  the  following  two 
additional  conditions: 

(3.1a)  JiJMI2  i  <  Mnu,  u>  <_  yjjull2 ,  Vu  e  x,  0  <  jjq  <_  un  < 

for  n  0  , 

with 

Ji_ 

(3.1b)  —  >_  a  >  0 ,  for  n  >_  0  , 

yn 

or,  if  f  is  twice  Frechet  differentiable  and 

(3.2)  0  <_  <  Mnu,  u>  _£  <  f"(x)u,  u> ,  Vu  S  X,  Vx  €  ft  for  n  >_  0. 

Note  that  Allwright  [15]  specifies  condition  (3.2)  in  his  method. 
Also,  the  conditional  gradient  method,  which  uses  M  *  0  for  n  >_  0,  is 
admitted  by  condition  (3.2)  for  convex  functionals,  since  f"(x)  is 
nonnegative  on  n  in  this  case.  Methods  whose  operator  sequences  satisfy 
(3.1)  include  Barnes'  method  [l6]  and  the  method  of  gradient  projection 


in  Hilbert  space  in  which 
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(3.3) 


x  =  Pn(x  -  a  7f(x  )). 
n  w  n  n  n 


The  operation  of  projection  of  xQ  -  an7f(xn)  onto  Q,  for  an  >  0,  is 
equivalent  to  solving 

(3.4)  x  =  arg  min  (<Vf(xn),  y  -  xR>  +  — ||y  -  xn||2}, 

n 

since  (3.3)  is  defined  as 


(3.5)  xn  =  arg  min  ||y  -  (xfi  -  an7f(xn)  )||‘ 

y€fl 


or 


(3.6) 


■  arg  min{2oin<  7f(xn),  y  -  xn>  +  ||y  -  xj|2  +  a^l  Vf ( xn )|| 2) , 


y€fl 


the  solution  of  which  satisfies  (3.4).  The  operator  M  ■  — I  in  (3.4) 

n  an 


clearly  satisfies  (3.1)  with  y  *  y  =  — .  The  relaxed  gradient  projection 

-n  n  cn 

schemes  in  Demayanov  and  Rubinov  [9]  specify  explicit  upper  and  lower 
bounds  for  the  sequence  {otn}.  Condition  (3.1)  does  not  need  that 
restriction,  although  the  requirement  (2.2)  that  {>0  "be  bounded  above 
imposes  a  lower  bound  on  {a  }. 

It  is  interesting  to  note  that  the  method  of  gradient  projection  is 
imbedded  in  a  larger  family  of  Hilbert  space  variable  metric  gradient 
projection  methods  in  which  at  each  step  the  projection  operation  and 
the  determination  of  the  gradient  is  carried  out  with  a  new  inner  product. 
Thus,  if  Mn  is  an  operator  satisfying  (3.1a)  then  as  an  operator  in  the 
(GS)  it  can  be  assume  that  is  symmetric,  i.e.,  < Mnx,  y>  =  < Mny,  x>. 


Vx,  y  €  X.  This  is  true  since  < Mnx,  x>  s  <  ( 


M  +  M  * 

— — - — — )  x  x  >  where  M  * 
2  ’  n 


2 
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M  +  M  « 

operator  of  Mq  on  X  and -  —■—■■■  is  symmetric.  Therefore,  one  can 

M  +  M  • 

consider  the  sequence  {Mn>  equivalent  in  the  (GS)  to  {— — - — #  Then, 
with  symmetric  and  positive  definite  a  new  inner  product  is  defined 
by 


<  x,  y>M  =  <Mrx,  y>. 

n 


Although  the  Frechet  derivative  is  the  same  for  all  of  the  related  norms, 
the  representation  of  f'  changes  with  the  inner  product,  since 

f'(x)[y]  =  <Vf(x),  y>  =  <  M  M  _1Vf(x) ,  y>  =  <M  '^(x),  y>M  . 

n  n  n  m 

n 

The  variable  metric  version  of  (3.3)  is  now 


x  =  pM  ( x  -  i,  “1Vf(x  )) 
n  M  n  n  n 

nI2 


or  equivalently 


xn  =  arg  min{<7f(xn),  y  -  xn>  +  |-<Mn(y  -  xn)»  V  -  xn>>- 


An  example  of  variable  metric  projection  which  is  commonly  practiced 
in  computations  in  En  is  the  technique  of  "scaling",  in  which  the  operators 
Mn  are  represented  by  diagonal  matrices  D  .  In  one  scheme,  for  example, 

a2f 

entries  on  the  diagonal  are  second  partial  derivatives,  - of  the 

3x. 

l 

functional  f.  Although  such  ad  hoc  methods  can  make  matters  worse, 
they  can  also  accelerate  convergence,  and  on  simple  sets  such  as  orthants 
and  boxes,  the  process  is  no  more  difficult  to  carry  out  than  "standard" 
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Jin 

gradient  projection.  Notice  that  the  condition  number  —  of  a  diagonal 

yn 

matrix  is  the  ratio  of  smallest  to  largest  diagonal  entry;  therefore, 
if  that  ratio  is  bounded  avay  from  zero  for  n  >_  0 ,  then  the  scaling 
procedure  satisfies  (3.1). 

The  "unrelaxed"  gradient  projection  methods  considered  by  Levitin 

and  Poljak  [8]  and  Dunn  [10]  in  which  ui  =1  for  n  >  0  can  be  considered 

n  — 

as  part  of  the  (GS)  provided  6  is  sufficiently  small.  In  both  cases 

the  methods  used  to  select  the  sequence  {a  }  are  such  that  at  each  step 

n 

Goldstein's  rule  will  select  co^  =  1  if  &  is  small  enough.  For  example, 
Levitin  and  Poljak  require  that  ct^  be  chosen  from  the  interval 

[e.  »  T~Z - 1  for  ei  >  eo  >  From  the  definition  of  g(x  ,  x  ,  to)  in 

j.  l  +  Eg  J.  c  n  n 

Goldstein's  rule  (2.3)  and  from  (2.17)  one  obtains 

1 2 


(3.7) 


L|lx  -  x  I 


2<  Vf (x  ),  x  -  x  > 
n  ’  n  n 


when  xq  is  not  an  extremal.  Also,  xr  minimizes  the  functional 

Q(“-I,  x  ,  •)  over  ft,  and  is  therefore  an  extremal  of  Q(— I,  x  ,  •) 
n  a  n 

n  n 

satisfying  (2.1).  In  this  case  (2.1)  reduces  to 


v  *„>• z  -i’i0- 


vz  e  s, 


or 


(3.8)  <7f(x  )  +  -“( x  -  x  ),  z  -  x  >  >  0, 

n  an  n  n  — 

n 


Vz  €  ft , 


29 


By  letting  z  *  xQ  in  (3.8)  one  can  write 

(3.9)  <Vf(xJ,  xn  -  £n>  >  ^{|xn  -  xj|2. 

n 

If  0  <  ~~  ,  then  (3.7)  and  (3.9)  give 

n  —  .b  + 

e(*D,  V  Dii*f 


Thus,  Goldstein's  rule  yields  m  =  1  if  6  <  1  -  r—~ —  .  Once  again  the 

n  L  +  Eg 

lower  bound  0  <  ^  gives  the  uniform  upper  bound  required  by  the 

(GS). 

The  following  theorem  gives  a  "worst  case"  convergence  rate 
estimate  for  methods  in  the  (GS)  when  either  (3-l)  or  (3.2)  is  satisfied. 
As  noted  above,  a  large  number  of  well  known  methods  are  included  in 
this  subclass  of  the  (GS). 

Theorem  3.1.  Let  OCX  where  X  is  a  Banach  space  and  Q  is  convex  and 

bounded  with  diam  Q  =  D.  Let  f  be  convex  and  differentiable  with  f' 

Lipschitz  continuous  on  G,  and  let  L  be  a  Lipschitz  constant  for  f ' . 

Then  inf  f  >  -»  and  if  the  (GS)  operator  sequence  (M  }  satisfies  either 
ft  n 

condition  (3.1)  or  (3.2)  then  the  value  r  =  f(x  )  -  inf  f  will  decrease 

x  n  n  a 

monotonically  to  zero  and  rn  =  0(~). 

Proof.  As  in  Theorem  2.1,  lines  [2.2h)  and  (2.25)  one  has 

2<5<  f '  ( x  ) 

(3.10)  f(x  )  -  f(x  )  >  6  min{l,  - 

n  n+1  —  T  i 

L||xn 


,  X  -  X  > 

— — —  }<f'(x  ),  x  -  X  > 
*  i|2  n  n  n 


-  x 
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Also,  inf  f  >  -«  follows  easily  from  the  Lipschitz  continuity  of  f*  and 

n 

the  boundedness  of  0.  Therefore,  for  every  n  ^  0  let  zq  be  such  that 

f(x  )  -  f(z  )  >  ir  (f(x  )  -  inf  f)  ■  I  r  .  Then,  as  in  Theorem  2.1,  since 
nn— 2nft2n 

f  is  convex  one  can  write 


(3.U)  *n-V  i  f<xn)  -  '<*„>  if  rn  >0, 

and  z  €  $(x  )  for  all  n  >  0.  If  (3.1)  holds  for  {M  }  ,  and  if  x  is 
n  n  —  n  n 

not  an  extremal,  then  Lemma  2.2  gives 


(3.12)  <f’(xn),  xn  -  xn>  >  |(min«f(xn),  xn  -  zn>,  <M  “#  x  V;>} 


n  n 


+  <Vxn  '  xn} ’  xn  *  xn>] 


With  (3.11),  (3.12)  becomes 


(3.13)  <  f’(xn) ,  xn  -  xn>  >_  ^min{  2  »  i*<  M  (x  -  z  ),  x  -  z  >} 

"  "  n  *  n 


n  n 


z  >J 
n 


+  <M(x  -  x  ) ,  x  -  x  >  ]  , 
n  n  n  *  n  n  * 


and  since  all  terms  in  (3.13)  are  positive  it  follows  that  both 

2 

(3.lka)  (f(xn),  xn  -  in>  >  u  (  .  g°)  ;  ~  >1 

n  n  n  ’  n  n 

and 


( 3 . l^b )  <  f'(x  ),  x  -  x  >  >  <  M  ( x  ,  x  ),  x  -  x  >  . 

n  n  n— 2  nn  n  n  n 
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Therefore,  using  appropriate  combinations  of  (3.1*»a)  and  (3.lUb)  one  has 


(3.15) 


LK  -  xjl2 


1  ^  min{ 


1+LH xn  ‘  ULH 


r  <  M  (x  -  x  ) ,  x  -  x  ) 
n  n  n  n  n  n  _  .  ^ 

x  -  x  In  M  (x  -  z),x  -  z  > 
n  n"  n  n  n  n  n 


,  i.  l  M 

1  .  r  n  n  -n 

>  T?r  mim - o  *  - 

IK  -  UJI  Xn  ‘  ZJ 


>  0. 


“•n2  2 

>  - -  =  c,r  , 

-  16lD2  1  n 


Line  (3.1^a)  can  be  written  as 


(3.16)  <f'(x  ),  x  -  x  >  >  §■  min{~-, 

n  n  n  -  2  2  kKD2 


and  then  (3.16),  (3.15) ,  and  (3.10)  yield 

(3.17)  f(xn)  -  f(xn+1)  >  6  min{-f,  SSc^2}. 

If  (3.2)  holds,  then 


(3.18)  <f’(xn),  xn  -  >y-  . 

This  is  true  since  with  Taylor's  formula  and  (3.2)  one  obtains  for  any 

yen 
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f(y)  *  f(xn)  +  <f'(xn),  y  -  xq> 

+  |  < f"(xn  +  e(y  -  xQ))(y  -  xn),  y  -  xQ>(l  -  e)de 

>  f(x  )  +  <  f'(x  ),  y  -  x  >  +  - < M  (y  -  x  ) ,  y  -  x  > 
—  n  n  n  dn  n  n 


*  f(xn)  +  Q(Mn’  V  y) 


>  f(xn)  +  Q(Mn,  xn,  xQ). 


With  y  =  z  ,  it  follows  that 
n 

r 

n 
2 


-«<Mn-  xn*  V  -  f(xn>  ‘  f(2n>  i 


or 


(f'(xn>>  x„  '  V 


and  since  is  a  nonnegative  operator,  (3.18)  results.  Combining  (3.18) 
with  (3.10)  one  has 

r  ,  2 

(3.19)  f(x)  -  f(x  )  >  6  min{-^-,  rn  } 

±  2LD 

By  Theorem  2.1,  (rn)  decreases  monotonically  to  zero,  and  with  (3.17) 
and  (3.19),  for  rfi  sufficiently  small,  it  follows  that 

2 

f(x  )  -  f(x  .,)  >  Sc0r  »  for  soire  c„  >  0. 
n  n+1  —  2  n  2 
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Therefore, 


r  <  r 
n+1  -  n 


-  {c2rn 


and  Lemma  2.1  gives  the  rate  estimate  rn 


QED 


Remark  3.1.  The  proof  of  inequality  (3.18)  in  the  theorem  is  due  to 
Allwright  [15]. 

Dunn  [10]  has  shown  that  sharper  convergence  rate  upper  bounds  can 
be  determined  for  the  gradient  projection  method  (1.5)  in  the  presence 
of  conditions  on  the  growth  rate  of  the  functional  f  near  an  extremal 
5,  i.e.,  condition  (1.10).  Condition  (l.ll),  which  expresses  structural 
properties  of  the  set  f!  near  5,  implies  condition  (1.10),  since  when  f 
is  convex. 


(3.20)  (  f*(C),  x  -  £>  <  f(x)  -  f(?) 


is  true  for  any  minimizer  £.  For  the  conditional  gradient  method  Dunn 
[4],  [14]  requires  the  condition  (l.ll)  to  establish  higher  rates  of 
convergence . 

In  Theorem  3.2  it  is  shown  that  whenever  it  can  be  established  that 
the  stepsizes  are  bounded  away  from  zero,  the  growth  rate  of  the  func¬ 
tional  f  near  5,  i.e.,  condition  (1.10),  is  enough  to  give  a  hierarchy 
of  linear  and  sublinear  convergence  rate  estimates  for  the  sequence  {rn). 
When  f'  is  Lipschitz  continuous,  condition  (1.13)  which  requires  that 
operators  in  the  sequence  (Mn }  be  uniformly  positive  definite,  is 
sufficient  to  prove  that  oj  >  0  since  line  (2.23)  and  Lemma  2.2  yield 


Condition  (1.13)  is  not  required  by  Dunn  [10]  for  the  gradient 
projection  method  (1.5);  however,  the  condition  wn  =  1  is  inherent  in 
the  method,  and  upper  bounds  for  the  sequence  {an}  in  the  gradient 
projection  schemes  in  [8]  and  [9]  are  equivalent  to  condition  (1.13). 

When  f  is  convex,  condition  (1.10)  implies  that  5  is  a  unique 
minimi zer.  It  is  possible,  however,  that  the  set  £2f  consists  of  more 
than  one  vector,  in  which  case  a  more  appropriate  condition  is 


(3.21a) 

where 


f ( x )  -  inf  f  ^  yd(x)V  , 

n 


vx  e  n. 


(3.21b)  d(x)  =  inf  ||x  -  yj|. 

Note  that  conditions  (1.10),  (l.ll),  and  (3.21)  require  that  £2f  be 
nonempty . 


Theorem  3.2.  Let  !1  C  x  where  X  is  a  Banach  space  and  £2  is  convex.  Let 
f  be  convex  and  differentiable  with  f  Lipschitz  continuous  on  £2,  and 


let  L  be  a  Lipschitz  constant  for  f ' .  Let  the  (GS)  be  such  that 


(3.22)  to  >  u>  >  0 ,  V  n  >  0 , 

and  let  condition  (3.21)  hold  with  v  €  [2,  «). 

If  v  =  2  then  {rn>  converges  linearly,  i.e.,  rn  =  0(An)  for  some 
A  €  (o,  l).  If  v  >  2,  then 

(3.23)  rn  =  0(n"v/(v_2)). 

Proof.  From  Goldstein's  rule,  (2.5),  and  (3.22)  there  results 
(3.2k) 

As  stated  in  Remark  2.2,  for  convex  f  and  any  y  €  one  has 


<  f ' (x  ) ,  x  -  y>>r  >0. 
n  ’  n  J  —  n  — 


Therefore,  by  Lemma  2.2  and  for  any  y  £  it  follows  that  if  x  #  y 

1  n 


if  <  Mn(xn  -  y) ,  xq  -  y> 


(3.25)  <  f ' ( x  ) ,  x  -  x 

n  n  n  — 


h' 

1 1 

[  2  mn{rn’  <  M  (x  -  y),  X  -  y>*  ’ 

\  n  n  n 


=  0 


if  <  Mn (xn  -  y ) ,  xn  -  y  >  >  0 


_>  ~  min{rn, 
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Let  (y  }  €  Q  be  such  that  for  n  >  0 
nr  — 


xn  "  yrJi  -  2d(x)  "  2  inf  ifxn  ”  yH 


Then  for  v  =  2,  condition  (3.21)  yields 


(3.26) 


r  2 


KHxQ-yJ?  **Kd(xn) 


i - - — P  t  Vjf  rn  » 


and  for  v  >  2,  since  r^2^  >_  y2^vd(xn)2  it  follows  that 


(3.27)  - J - g-  >  ^  r  (2'2/V>. 


Combining  (3.24)  -  (3.26)  one  has  with  v  =  2 


f(xn)  -  f(xn+1)  >,  £mrn  minCl,  =  qrQ,  with  q  €  (o,  l), 


Therefore , 


Vl  -  (1  '  q,rn 


which  implies 


rn  =  0((1  -  q)n)  =  0(Xn). 


If  v  >  2,  then  (3.24),  (3.25)  and  (3.27)  yield 

fUn>  '  f(W  i  (T>Binfrn'  ^IT  r„(2'2/U)> 
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which,  for  r  sufficiently  small,  implies 
n 


r  <  r  -  qr 
n+1  —  n  n 


(2-2/v) 


,  where  q  = 


2/v 


The  rate  (2.23)  follows  from  Lemma  2.2. 


QED 


As  with  condition  (1.10),  condition  (l.ll)  implies  that  the 
minimizer  £  is  unique;  however,  an  extension  to  a  condition  like  (3-21) 
is  not  possible  here. 

The  following  lemma,  which  is  a  modification  of  Lemma  2.5  in  [22], 
will  be  needed  in  the  next  two  theorems . 


Lemma  3.1.  Let  fi  C  x  where  X  is  a  Banach  space  and  is  convex.  Let  f 
be  differentiable,  f'  Lipschitz  continuous,  and  let  L  be  a  Lipschitz 
constant  for  f ' .  Let  condition  (l.ll)  hold  at  £  with  v  £  [l,  °°).  If 
(xn>  is  generated  by  the  (GS),  then 

(3.28)  ||xn  -  £ ||V_1  1  (^“ -^)||xn  -  C||  for  n  >  0, 

where  K  is  the  uniform  bound  on  the  norms  of  the  operators  (see  (2.4)). 
Proof.  From  (2.2)  one  has  for  n  >_  0 

0  lQ(Mn’  V  5)  -  Q(Mn’  V  *n} 


or 


0  <<f(xn),  £  -  xn>  +|<Mn(xn  -  £),  xn-£>  -<f’(xn),  xn  -  xn 


-  <  M  ( x  -  x  ) ,  x  -  x>. 

2  n  n  n  ’  n  n 


(3.29) 
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From  the  positivity  and  the  linearity  of  the  operators  Mq,  there  results 

(3.30)  |<Mn(xn  -  «).  *n  -  5>  -  -  *„).  -  in> 

*  V*n  -  5).  *n  -  5>  -  f  <«„<*»  -  in>.  \  -  V 

-  J<V*n  -  V-  {  -  V 

•  i<Mn(in-5),  xn-{>*  i<Mn(xn-  «),»„-  U 

-|<.Mn(in-S).Sn-5>. 

Furthermore,  (3.29),  (3.30)  and  the  Lipschitz  continuity  of  f'  yield 

(3.31)  |<Mn(xn  -  C),  xn  -  V  <  <f*(xn),  5  -  in> 

+  |<Mn(ia  -  «>.  *„  -  5)  ♦  |<Mn«n  -  5),  -  o 


or 


(3.32)  j  <  Mn(in  -  C),  xn  -  ?>  +  <f'(C),  xn  -  5> 

i<f'(5)  -  f'(xn),  xn  -  e>  +  K||xn  -  C||||xn  -  5 II 
1  (L  +  K)||xn  -  £ ||  ||xn  -  Cll  . 

Finally,  by  the  positivity  of  and  condition  (l.ll)  one  has 
Y»in-5||Vi(L-K)||xn-5||||in-C||. 

QED 
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When  { u>  }  can  decrease  to  zero,  condition  ( 1 - 11 )  is  used  with 
n 

Lemma  3.1  to  estimate  its  rate  of  decrease,  as  shown  in  the  following 
two  theorems: 


Theorem  3.3.  Let  Q  C  x  where  X  is  a  Banach  space  and  Q  is  convex  and 
bounded.  Let  f  be  convex  and  differentiable  with  f*  Lipschitz  continuous 
on  fl,  and  let  L  be  a  Lipschitz  constant  for  f'.  Let  the  (GS)  operator 
sequence  {JO  satisfy  either  condition  (3.1)  or  (3.2),  and  let  condition 
(l.ll)  hold  at  C  with  v  6  [2,  ®).  If  v  =  2,  then  rn  -  0(An)  for  some 
AG  (0,  l),  and  if  v  >  2,  then 


(3.33) 

Proof. 


/ —V ( V— 1 )  s 

r  -  OCn  v(v-l)-2  ) . 
n 


As  in  the  proof  of  Theorem  3.1,  line  (3.10),  one  can  write 


(3.3*0  f(x  )  -  f(x  )  >  5  min«  f  *  (x  )  ,x 
n  n+1  —  n  n 

Clearly,  (3.11)  is  satisfied  with  zn  =  C, 
and  (3.3*+)  yield 


26<  f'(x  ),  x  -  x  > 

~  .  n  n  n  , 

“  »  liO  J  " 


n 


L  x_ 


-i  IP 


'n  n 

and  when  (3.2)  holds,  (3.18) 


(3.35) 


f(xn>  -  f(W 


>_  g  min{rn. 


From  (3.1^a)  and  (3.15)  in  Theorem  3.1,  where  (3.1)  holds  one  has 


f<xn}  "  f(xn+l} 


6  •  / 1 
>_  g  , 


Sr 


Sar 


2  n’  2K||x  -  cl!2’  **l||  x  -  if  Ul|U  -  Cl!2 


■}. 


(3.36) 


When  f  is  convex,  (l.ll)  implies  (1.10),  and  as  in  Theorem  3.2,  it 
follows  for  v  >  2 


,3.37, 

l!*n  * 


Using  Lemma  3.1  with  the  triangle  inequality  one  can  write 


lie II 


*  (i^)1/(U'1>|!x„  -  5||1/('’-1) 


=  <«*„  -  5||(v-2)/(v-i>  *  (^)1/<U"1>)||xn  -  5lfL/(v-1>. 


Therefore, 


(3.38) 


lxn  -  ”  (d('’-2,/(v-1)  .  (iii) 


T7Tv-J 


-  e|p/(v-1) 


>  c,r 
—  3  n 


2-2/(v(v-l) ) 


where 


62/(v(v-l)) 

”3  (  v-2 )  /  (  v-i)  +  (L  +  K)171^1^ 


and  D  =  diam  ft. 


Since 


<2  -  f»  i  <2  - 


for  v  >  2, 


it  follows  that 


(3.39) 


„  2-2 /v  2-2/v(v-l) 

r  >r 

n  —  n 


for  rQ  sufficiently  small.  By  Theorem  2.1,  r  0  and,  therefore,  (3.35) 
and  (3.36)  can  be  written  as 


<  r  -  qr 

—  n  *  1 


(2-2/w(v-l)) 


,  q  >  0, 


for  rR  sufficiently  small.  The  result  (3.33)  follows  from  Lemma  2.1  with 
v  >  2,  and  when  v  =  2,  rn  =  0(Xn)  for  some  X  e  (0,  1). 

QED 


Up  to  this  point  the  emphasis  has  been  placed  on  determining 

convergence  rates  for  the  sequence  (r  }  =  {f(x  )  -  inf  f}.  It  is  possible 

n  n  fi 

that  the  sequence  of  iterates  {xn>  has  no  limit  points,  and  a  rate  on  the 
sequence  {rn>  is  the  best  one  can  do.  Also,  in  most  applications  approxi¬ 
mating  the  minimum  value  of  the  functional  f  is  the  primary  objective. 

Note  that  conditions  (1.10)  and  (l.ll)  give  convergence  rates  for  the 
sequence  (|Jxn  -  ?|j}  when  a  rate  for  {rn}  is  known.  If  rn  >_  y||x  -  £||V 
for  v  >  2,  then  if  rn  =  0(n_k),  it  follows  that  ||xn  -  £||  =  0(n_k//v). 

Similarly,  if  v  =  2,  then  linear  convergence  of  {r  }  implies  linear 

n 

convergence  for  {||xn  -  5||). 

In  the  following  lemma,  it  is  shown  that  condition  (1.10)  for 
v  S  [l,  2)  implies  that  condition  (l.ll)  holds  at  £.  (This  is  also 
shown  in  [10]  and  [H].)  In  Theorem  3.1*  this  fact  is  used  to  show  super- 
linear  convergence  or  finite  termination  for  the  sequence  (||xn  -  C||) 
for  any  operator  sequence  {Mn}  in  the  (GS). 
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Lemma  3.2.  Let  8  C  x  where  X  is  a  Banach  space  and  ft  is  convex  and 
bounded.  Let  f  be  differentiable  with  f'  Lipschitz  continuous  on  ft  and 
for  some  5  c  let  f  satisfy 

(3.40)  f(x)  -  f(?)  >  fj| x  -  g|r.  Vx  e  ft,  and  v  €  [l,  2). 

Then 

(3.41)  <f'(5)»x-£>>_Ylix-£||V»  Vx  €  ft,  and  some  y  >  0. 

Proof.  By  Taylor's  formula  and  the  Lipschitz  continuity  of  f  (see  line 
(2.22))  , 

(3.42)  f(x)  -  f(£)  <<  f'(C),  x  -  £>  +  R|(x  -  df.  Vx  e  ft,  R  <  » , 
and  with  (3.40)  one  has 

<f'(5),  x  -  5>  >  rite  -  sir  -  R|jx  -  sll2 

*  (r  -  R||x  -  SlP~v)||x  -  Sir. 

Therefore, 

(3.43)  <  f(e),  x  -  C>  >  J||x  -  e||V  ,  Vx  €  ft  n  B  (C)  , 

—  d  p 

where  B  (c)  is  a  closed  ball  of  radius  p  =  .  Let 

P  2R 

p  <  D  =  sup  [|y  -  and  let  y  =  ^■(^■)v_1.  By  the  convexity  of  ft 

y€ft  *  u 

it  follows  that  for  every  x  €  ft  -  B  (?)  there  exists  a  number  r  €  (p,  D] 

P 

and  a  vector  y  S  ft  with  ||y  —  C |J  =  P  such  that  x  -  £  =  j(y  -  5).  Therefore, 

(3.43)  yields  for  v  €  [l,  2) 


1*3 


<  f'(C),  X  -  s>  =  <f'(5),  f{y  -  5)> 


>  g-tl^  -  ^HV 


•13“ll?<y-e>ir 


2prV  p 

>^iiix-  €ir 

2D 


=  9l|x-C||V. 


Since  > 


y  the  lemma  is  proved  for  all  x  G  ft. 


QED 


Theorem  3.1*.  Let  fl  C  x  where  X  is  a  Banach  space  and  ft  is  convex  and 
bounded.  Let  f  be  convex  and  differentiable  with  f  Lipschitz  continuous 
and  let  L  be  a  Lipschitz  constant  for  f ' .  If  for  £  G  ftf, 

(3.1*2*)  f(x)  -  f(C)  ^  y||x  -  Cir,  for  some  v  G  [l,  2), 


and  if  (xn>  is  a  sequence  generated  by  the  (GS),  then  { |(xn  -  C||)  converges 
to  zero  superlinearly ,  i.e.,  either  xn  =  £  for  some  n  >  0  or 


(3.U5) 


lim  n;sr~ 


1 JT^r 

"  n 


If 


n-*» 

=  1  in  (3.1*1*) ,  then 


for  some  N  <  “. 


Proof.  By  Theorem  2.1,  r^  -*■  0  and,  therefore,  xn  -*•  C  follows  from  (3.1*3). 


Line  (2.23)  gives 


Ixn  -  XqII  1  -  4\  +  llxn  *  5||) 


1  (IU„  -  5||  +  (i-r^)1/<U"1,||x  -  C||1/<v'1)) 

Y 

=  (1  *  (^)1/(v'1Vn  -  ell(2-v,/<v-1,)||x„ 

Y 

<(1,(kiK,1/<',-1>D(2-'')/(-1))||xil-c|| 

Y 


-  clJI  xn  -  €11- 


The  lower  bound  (3.^8)  now  becomes 


u  >  mind,  - g - - - - r>. 

2Lck  K  -  k^ck‘ 2(|xn  -  cir 


and  with  (3.^^)  there  results 


w  i  min{l,  - Si - — ,  - -  - o7g-  - y) . 

2Lcul!xn  -  Clt2^  -  Cll2^7 

Since  ||x  -  c||  -►  0,  it  follows  that  un  =  1  for  n  >_  N.^  for  some  >  0. 
Thus,  for  n  >_  N^,  (3.^9)  yields 

NVl  "  ^  .  ,L  *  _ ||(2-v) /( v-l) 

-[^prrri*— >  K-'U 

which  implies  (3.^5).  When  v  =  1,  the  finite  termination  of  the  process 


follows  directly  from  (3.^9),  since 


h6 


9|^-C||1(L  +  K)|tcn-S||||in-C|| 

is  true  for  n  >.  0 ,  and  if  ||xn  -  5j|  <  then  ||xn  -  5||  =  0. 

QED 

Remark  3.2.  Dunn  [U],  [17]  has  shown  that  conditions  (l.ll)  and  (1.10) 
can  be  established  for  certain  extremals  found  in  problems  from  optimal 
control  theory.  Let  U  be  a  nonempty  convex  set  in  Bm,  and  let  the 
constraint  set  ft  be  the  set  of  functions 

ft  =  (measurable  u(*):[0,  l]  -►  U}. 


Here  f,  the  functional  to  be  minimized,  is  defined  and  differentiable  on 
a  neighborhood  of  ft  in  one  of  the  spaces  LP([0.  l] ,  ]Rm)  and  problem  (P) 
becomes 

min  f(u( • ) )  . 
u(  •  )^ft 

Condition  (l.ll)  at  an  extremal  function  £(•)  becomes 


(3.50)  <  f'U(O),  u(-)  -  5(*.)>  >  yIM*)  -  5(  •  )||p  , 

where,  in  this  case, 

<f'U(0),  u(0  -  £(•)>  =  f  y(t)  •  (u(t)  -  5(t))dt 

J0 

with  y(')  the  (unique)  representor  of  the  Frechet  derivative  of  f(5(*)) 
in  the  conjugate  space  LCi([0,  l],Bm)  with  q  =  p/(p  -  l)  and 


|u(  •)  -  e(.)|lp  »  (|j|u(t)  -  C(t)|pdt)1/p  . 


In  [17]  it  is  shown  that  if  U  is  a  bounded, polyhedron  in  ]Rm  and  if  £(*), 
is  a  certain  type  of  "bang-bang"  extremal,  then  values  for  v  can  be 
calculated  directly  from  such  factors  as  the  number  p  and  the  growth 
properties  of  a  scalar  "switching  function"  s(t)  on  [0,  l]  determined 
by  y(t)  and  0.  For  example,  if  f  is  convex,  U  is  [-1,  l]  CH1,  and  y(t) 
is  continuous,  nondecreasing,  and  has  an  isolated  zero  at  t  =  |  ,  then 


f 


“1  ,  “  —  L  *  g  ' 


t  e  [0,  i) 

(3.51)  C(t )  e  [-1,  1],  t  =  | 

t  e  (§-,  1] , 


V. 


+1 


1  2 

and  v  in  (3.50)  has  the  value  2  and  k  in  L  and  L  respectively.  Thus 

by  Theorem  3.2  the  gradient  projection  method,  which  is  limited  to  the 
2 

Hilbert  space  L  ,  would  generate  a  sequence  of  iterates  whose  convergence 

rate  estimate  is  f(u  (“))  -  f(C(-))  =  0(-^);  a  simple  example  with 

n 

minimizer  (3.51)  shows  that  this  estimate  cannot  be  improved.  On  the 

other  hand,  the  conditional  gradient  method  makes  sense  in  L  ,  and  for 

v  =  2  this  method  converges  linearly  according  to  Theorem  3.3.  Note  that 

the  L ^  analog  of  Filter*  -jce  :radi er.t  rr - .'--etion  in  which  M  =  —I  in 

n  a 

n 

the  (GS)  is  a  method  in  which  Q  (y)  =  <f'(x  ),  y  -  x  >  +  7r^-||y  —  x  II? 

n  n  n  2a  u"l 

n 

is  minimized  at  each  step.  This  method  is  not  formally  in  the  (GS); 
however.  Lemma  2.2  and  Theorems  2.1,  3.1,  and  3.2  could  be  modified  to 
give  the  same  results  for  this  method  with  0  <  a  <  a  <b< 


OO 


Vn  >  0, 


and  in  the  example  (3.51)  above  the  convergence  rate  in  L ^  would  be  at 
least  linear.  Its  feasibility  in  however,  is  questionable. 

Remark  3.3.  The  results  of  this  chapter  are  readily  extended  to  pseudo- 
convex  functionals  satisfying  (2.32)  in  Remark  2.U,  when  is  nonempty. 


4.  Newton's  Method 


When  f  is  convex,  f"  is  nonnegative  and  symmetric  (see,  e.g. ,  [22]), 

and  Newton's  method,  in  which  =  f"(xn),  is  in  the  (GS).  The  "worst 

—1/3 

case"  convergence  rate  of  r^  =  0(n  '  )  given  by  Theorem  2.1  when 

||f"(x)j|  <_  K,  Vx  S  [J,  seems  far  too  conservative,  since  in  the  regular 

cases  for  which  convergence  rate  estimates  exist,  the  rates  for  Newton's 

method  are  clearly  superior  to  those  of  such  first  order  schemes  as  the 

gradient  projection  method  and  the  conditional  gradient  method.  On  the 

other  hand,  f"(xn)  need  not  satisfy  either  condition  ( 3 . 1 )  or  (3.2),  and 

so  Theorem  3.1,  which  gives  the  rate  0(“)>  does  not  necessarily  apply 

to  Newton's  method.  The  fact  that  M  =  f"(x  )  in  Newton's  method, 

n  n 

however,  can  be  employed  to  strengthen  a  number  of  the  fundamental 
inequalities  used  in  previous  theorems,  and  convergence  rate  estimates 
for  Newton's  method  will  be  shown  to  be  better  than  those  of  the  first 
order  methods  in  a  number  of  less  than  regular  cases. 

The  following  two  lemmas  improve  inequalities  (2.23)  and  (3.28) 
when  f"  is  Lipschitz  continuous  on  ft,  i.e.,  when  there  exists  an  L  >  0 
such  that  ||f"(x)  -  f"(y )  ||  1  L(fx  -  y||,  Vx,  y  €  ft. 

Lemma  4.1.  Let  ft  C  X  be  convex,  where  X  is  a  Banach  space.  Let  f  be 
convex  and  twice  differentiable,  f"  Lipschitz  continuous  on  ft  with  L  a 
Lipschitz  constant  for  f"  on  ft,  and  let  f"  satisfy  ||f"(x)||  <_  K  <  ® 

Vx  €  ft.  If  {xn>  is  a  sequence  generated  by  the  (GS)  with  =  f"(xn) 
then  for  any  y  belonging  to  t.he  set  $(xn)  in  (2.9), 
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(U.l) 


Proof. 

Taylor' 

(U.2) 


where 


0)  > 
n  — 


mints , 


<  r(xn),xn-y> 

4  <f“(xnHxn-y),*n-y> 


L||x  -x  Ip 
n  nu 


if  < f"(xn)(xn-y) ,xn-y>  >  0 


2<5<  f'(x  ),x  -y>  | 
mints,  (“■ - 5“ — )  }, 


if  <  f"(xn)(xn-y),xn-y>  =  0. 


If  oj  >1  and  x  is  not  an  extremal,  then  Goldstein's  rule  and 
n  n  * 


s  formula  yield 


1  -  &  > 


f(x  )  -  fix  +  mix  -  x  )) 
n  n  n  n  n 

oj  <  f '  (x  ) ,  x  -  x  > 
n  n  *  n  n 


<u  <  f '  (x  ) ,  x  -  x  > - -  ( f " ( c  )  (x  -  x  ) ,  x  -  x  > 

n  n  '  n  n _ 2 _  n  n  n  n  n 

ui  <  f '  ( x  ) ,  x  -  x  > 
n  n  n  n 


C  =  x  +  9  (x  +  cj(x  -  x  )  -  x  ) 
n  n  n  n  n  n  n  n 


=  x  +  9ui(x  -  x)  for  some  9  G  [0,  l] 

n  n  n  n  n  n 


It  follows  from  the  Lipschitz  continuity  of  f"  that 


co 


(4.3) 


1  -  6  >  1  - 


T<f"(3cnH>n  -  *n  -  V 

<f'(ltn>-  xn  -  V 


^<(f'(xn)  -  f"un))un-;o). 

rf'(V-  \  -  V 


>  1  -  (0 


<f"(x  )(x  -  x  ),  x  -  X  >  +  Leo  0j|x  -  x  I 

_ _ n  n  n’n  n  n  rr  n  n1 


2<  f'(x  ),  x  -  x  > 
n’n  n 


and  one  obtains 


(4.4) 


W  * 
n  — 


26<f,(xn}»  Xn  -  V 


<  f "  ( x  )  ( x  -  x  ) ,  x  -  x  >  +  Leo  e  j|x  -  X  I 
n  n  n’n  n  n  n’^n  n1 


Let  y  e  $(x  )  and  assume  <  f"(x  Hx  -  y),  x  -  y>  >  0.  Then  Lemma  2.2 
n  n  n  n 

and  (4.1c)  yield 


<f'(x  ),x  -y> 

5  min{<  f'(x),x  -y>,  - £ - 2_ - r  }  +  6<  f"(x  )(x  -x  ) 

n’n  <  f"(x„  )(x  -y)  ,x  -y>  n,v  n  n 

(4.5)  co  >  — - - — - - - - 

<  f"(x  )(x  -x  )  ,x  -x  >  +  Leo  9  |(x  -x  || 
n  n  n  "  M  n  n  *  “  n  n  n 


n  n 


n  n ,r  n  n 1 
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Suppose  <  6.  Then  from  (4.5)  one  has 


(4.6)  6<f"(x  ){x  -  x  ),  x  -  x  >  +  a)  2L)jx  -  x  |p 

n  n  n  n  n  n'rn  n" 


i  10 <  f"(x  )(x  -  x  ),  x  -  x  >  +  0)  L0  |Jx  —  x 

—  n  nn  n’n  n  nn,rn  n 


*  n3 


i  6  “in  {  <  f’(xn),  xn  -  y>, 

+  6<f"(xn)(xn  “  *n]'  xn  “  xn> 


<f’(xn),  xn  -  y>‘ 


xn)(xn  -  y)‘  xn  " 


or 


(4.7) 


<f’(xn),  xn  -  y)‘ 


2 

0)  > 
n  — 


6  min{<  f ' (x  ),  x  -  y>,  7—577 — 77 - ? - -} 

n  n  J  •  <  f"(x  )(x  -  y  ,  x  -  y>' 

”  ~  n 


n  n 


l|J x  -  x  ||3 

"  n  n  1 


If  ^  f  (x^)(xn  -  y) »  x^  -  y )  -  0,  then  using  Lemma  2.2  and  the  above 
argument  one  obtains 


(4.8) 


2  26<  f'U^) ,  x^  -  y> 

Lj)  x  -  x  ||3  ’ 

"  n  a" 


w  > 

n  — 


and  the  result  (4.1)  follows  from  (4.7)  and  (4.8), 


QED 

Lemma  4,2.  Let  C  X  be  convex,  where  X  is  a  Banach  space.  Let  f  be 
convex  and  twice  differentiable,  f"  Lipschitz  continuous  on  n  with  L  a 
Lipschitz  constant  for  f"  on  a,  and  let  f"  satisfy  ||f"(x)||  <  K  <  «, 

Vx  e  n.  Let  5  be  the  unique  minimizer  of  f  in  Q  and  suppose  (l.ll)  holds, 
i  .e. , 
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<  f'U),  x  -  o  1  Y ||  X  -  e||V  ,  Vx  e  fl,  V  €  [l,c). 


If  {x  }  is  a  sequence  generated  by  the  ( GS )  with  M 
n  n 


f"(xn)  then 


(u.s»  iiin  -  sir1  <  <^)iixn 


-  elf 


Proof.  From  (3.31)  in  Lemma  (3.1)  one  has  with  M  =  f"(x  ) 
1  n  n 

(**.10)  f"(xj(x„  -  c),  x,  -  e>  +  <  f'U),  x.  -  V 


n  n 


n 


-  V 


<  <  f'U)  -  f»(x  ),  x  -  £>  +  ±  <  f"(x  )(x  -  e),  X 
—  *  n  *  n  s  2  n  n  ’  n 


*  J<f"(xn)(in  -?).  *n  -  5  >. 


Since  the  second  derivative  operator  is  symmetric  and  nonnegative,  one 
obtains  with  the  Mean  Value  Theorem 


(**.11)  <  f'U),  xq  -0  <  <f”Un)U  -  xn),  xn  -  e> 

+  <  f"(xn)(xn  -  5),  xn  -  5>  , 

where  e  =x  +  0  U  -  x  )  for  some  0  S  [0,  l].  From  (l.ll)  and  the 
n  n  n  n  n 

Lipschitz  continuity  of  f"  there  results 

v!Jx  -  Sir  1  L||xn  -  cn||  ||xn  -  c||  ||xQ  +  ell 

i.I.|l*n  -  slf  l|xn  -  til  . 

QED 


Using  the  results  of  Lemma  4.1  in  the  proof  of  Theorem  2.1  one  can 

—1/2 

easily  obtain  a  new  "worst  case"  rate  of  convergence  of  rn  =  0(n  ) 

for  Newton's  method  when  f  is  convex.  Lemma  4.2  can  be  used  to  prove 
that  a  hierarchy  of  convergence  rate  estimates  exists  as  in  Theorem  3.3 
when  condition  (l.ll)  holds  at  a  minimizer;  however,  the  estimates  would 
still  be  worse  than  those  in  Theorem  3.3.  The  following  assumption  on 
the  functional  f  near  a  minimizer  £  will  give  a  hierarchy  of  convergence 
rate  estimates  superior  to  those  in  Theorem  3.3,  and  Lemma  4.3  will 
prove  that  the  assumption  is  actually  true  for  a  large  class  of  convex 
functionals. 

Assumption  (A).  If  f  is  convex  and  £  is  a  minimizer  of  f  on  ft  then  for 
some  p  >  0,  some  c  >  0  and  all  x  £  ft  O  Bp(£) 

<  f'(x),  x  -  V2  >c(f(x)  -  f  (C)  X  f"(x)(x  -  £),  x  -  £>. 

Although  (A)  is  not  true  for  all  convex  functionals ,  as  an  example  of 
Dunn  [l4]  in  the  Hilbert  space  shows,  it  is  conjectured  that  (A)  is 
true  whenever  condition  ( 1 . 10 )  holds  at  a  unique  minimizer  of  f  in  Q, 
i.e.,  when  the  functional  near  the  extremal  grows  like  ||x  -  £||V  for 
some  v  S  [l,  “) .  The  following  lemma  supports  this  conjecture. 

Lemma  4.3.  Let  ft  C  x  be  convex  where  X  is  a  Banach  space.  Let  f  be 
convex,  five  times  differentiable  with  f^  Lipschitz  continuous  on  ft 
with  L  a  Lipschitz  constant  for  f^  on  ft,  and  suppose  for  some  £  £  ft 

f(x)  -  f(C)  1  Y |Jx  -  C II5,  Vx  e  n. 

Then  f  satisfies  assumption  (A)  at  £. 
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Proof.  The  proof  can  be  found  after  the  proof  of  Theorem  4.1. 

Remark  4.1.  Similar  proofs  csui  be  given  for  (i)-differentiable  convex 
functionals  with  Lipschitz  continuous  i— —  derivatives  when 
f(x)  -  fU)  l  y|lx  -  i-ll1  for  i  =  3,  4,  5. 

Theorem  4.1.  Let  !)  C  X  be  convex  and  bounded  where  X  is  a  Banach  space. 
Let  f  be  convex  and  at  least  twice  differentiable  with  f"  Lipschitz 
continuous  on  with  L  a  Lipschitz  constant  for  f"  on  n,  and  let  f" 
satisfy  ||f"(x)||  <_  K  <  <*>,  Vx  6  8.  Furthermore,  suppose  that  assumption 
(A)  holds  at  a  unique  minimizer  £6  ft.  Finally,  let  (xn>  be  a  sequence 
generated  by  the  ( GS )  with  =  f " ( xr  ) .  Then : 

(i)  If  ( 1.10)  holds  for  some  v  6  [2,  ®),  the  values  r  =  f(x)  -  f(£) 

n 

satisfy  r  =  0(-4r)  (at  least), 
n  d 

n 

(ii)  If  (1.11)  holds  for  Ve(3,  “>) ,  then  rn  =  0(n'2v( )  # 

(iii)  If  (l.ll)  holds  for  v =  3,  then  r  =  0(An)  for  some  A  €  (0,  1). 

n 

(iv)  If  (l.ll)  holds  for  v  €  [l,  3),  then  the  sequence  {||xn  -  5||} 

converges  superlinearly,  i.e.,  either  xn  =  C  beyond  some  N,  or  else 


Proof.  In  all  cases  rn  -*•  0  by  Theorem  2.1,  and  since  condition  (1.10) 
or  (l.ll)  is  satisfied  here,  it  follows  that  ||x^  -  £||  -*■  0.  It  can  be 
assumed,  therefore,  that  for  some  c  >  0  the  inequality 
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L, 


(4.12)  <f'(xn),  xn  “  5>Z  i  ern<  f”(xn)(xn  -  (■),  xn  -  ?> 

is  true  uniformly  in  n  >  0, 

(i)  Prom  Lemma  2.2,  Goldstein's  rule  (2.5),  (4.12)  and  the  convexity 
of  f  one  has  for  n  >  0 


(4.13) 


«•<*„> 


f(x  ,  )  1  Sw  <f’(x  ),  Xn  “  Xn * 
n+1  —  n  n  n  n 


>  ioi  c,r  ,  where  c,  *  ~  min{l,  c}. 
—  n  1  n  ’  12 

Condition  (4.12)  and  Lemma  4.1  with  y  =  5,  Vn  >  0,  now  yield 


2fic  r  , 

(4.14)  u,  >  min{6 ,  ( - ~ — ~)1/2} 

"  "  -  ‘nil3 


26c..  r  1/p 

>_min{6,  ( - ip)  '  },  where  D  =  diam  ft . 

LIT 


For  n  sufficiently  large,  (4.13)  and  (4.l4)  give 


r  -  <  r  -  qr 
n+1  —  n  ^  n 


3/2 


with 


q  .  (  *  )l/2({e  ,3/2. 
let  1 


This  implies  r  =  0(-^-),  by  Lemma  2.1. 

n 


k 
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(ii)  By  using  (l.ll)  with  u  S  (3,  »)  in  Lemma  4.2  one  obtains 
(‘•15)  ||in  -  €||  £  (i)1/(''-l)||xn  -  5||2/(v-1)  for  »  >  0. 

Therefore,  by  the  triangle  inequality  one  obtains,  as  in  Theorem  3.3, 


provided  xfl  1  C.  Also,  since  f  is  convex,  (1.11)  implies  (1.10)  and 
can  write 

(4.17)  r  >  r  l-6/(v(v-l))  6/(v(v-l))|(  .  f|6/(v-l) 

The  inequalities  (U.l6),  (4.17),  and  (4.l4)  now  give 

(4.18)  »  >  min{6,  c.r  (v(v-l)-6)/(2v(v-l))}, 

n  —  d  n 


where 


Finally,  (4.18)  and  (4.13)  yield  for  n  sufficiently  large 


l+(v(v-l)-6)/(2v(v-l) ) 

rn+l  -  rn  '  qrn 


with 


q  =  <Sc2c1. 

The  desired  result  now  follows  from  Lemma  2.1. 

(iii)  From  (4.13)  it  is  easy  to  see  that  if  u  >  u  >  0,  Vn  >  0,  then 

n  —  — 

r  =  0(An)  with  A  =  max{0,  1  -  6c.u>}.  When  (l.ll)  holds  with  v  *  3,  it 
n  x 

follows  from  Lemma  4.2  that 

<•*.19)  iun  -  <n « (^)1/2ll*n  -  Ell. 

and  using  the  triangle  inequality  one  finds  that 

<•*.20)  ||xtt  -  ;n||3  <  (II s  -  Ell  *  l|in  -  Eli)3  1  (1  *  <^>1/2>3||xn  -  ElP. 

Since  r  >_<f,(C)»  xn  -  £>  Y||xn  -  C|P,  (4.20)  and  (4.l4)  combine  to 
yield 


a)n  >_  min{6  ,  ( 


26C;ly 

l{ l  ♦  (i)1/2)3 


)1/2) 


u)  >  0 . 


(iv)  When  (l.ll)  holds  with  v  S  (l,  3),  then  Lemma  4.2  states  that  if 

x  4  S,  then 
n 


(4.21) 


<.  (t,1/<v-1)k  -  e1i(3-v)/<u-1). 
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It  will  now  be  shown  that  u  *  1  (and  consequently  x  . ,  =  x  )  for  n 

n  n+l  n 

sufficiently  large;  this  result,  together  with  (4.21)  implies  that  xq 
converges  superlinearly  to  £. 

Goldstein's  rule  selects  u>n  -  1  if  g(xQ,  xn,  l)  >_  6  for  the  given 
5  G  (°»  j)  •  From  the  definition  of  g(x,  x,  ai)  and  Taylor's  formula  one 
has 


(4.22) 


1) 


f(x  )  -  f(i) 
n  n 

<  f '  (x  )  ,  X  -  X  > 

n  ’  n  n 


=  1  - 


|(  f'(»n  ♦  -  *„))(*,  -  »„),  in  -  xn) 

*„-v 


for  some  0n  G  [0,  l] , 


provided  xr  ¥  5.  Since  f"  is  Lipschitz  continuous,  it  follows  that 

<  f"(x  ) ( x  -  i  ),  x  -  k  ) 

(4.23)  g(x  ,  x  ,  1)  >  1- - 2 - 2 - 2 - S - — 

2 


Lllx  -  x  I 
11  n  n 1 


xn  -  in> 


For  any  operator  M  and  each  fixed  x  S  Q,  Q(M,  x,  ■)  is  a  functional  on 
Q  and  if  x  minimizes  Q(M,  x,  • )  on  Q  then  x  is  an  extremal  of  Q(M,  x,  •). 
If  M  is  symmetric,  then  (2.1)  yields 


<(f'(x)+M)(x-x),z-x>>0,  Vz  €  n. 
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In  particular,  for  z  =  x  this  gives 

(4.24)  <f'(x),  x  -  x>  >_<M(x  -  x),  x  -  x>. 


Since  f"(xn)  is  symmetric  and  nonnegative  one  can  use  (4.24)  in  (4.23) 
to  get 


1  LHxn  ‘  V 

(4.25)  g(x  ,  x  ,  1)  >.  j - 


From  Lemma  4.2  with  assumption  (A)  one  has 


(4.26)  <f'(xn),  xn  -  xq>  c1rn  1  clYlixn  ~  ?I|V»  where  c1  =  ^  min{l,  c) 


Furthermore,  Lemma  4.2  and  the  triangle  inequality  yield 


(4.27)  || 


-  sall3  i  (ll*„  -  elf  *  lt*„  -  ell)3 


i  <ll*n  -  5 II  *  ^>1/('"1,|Un  -  ClI2"”'1’)3 
<  (1*  (t,l/(»-l)D(3-''>/<'-l))3||Xa  _  {|f. 


Together,  (4.27),  (4.26),  and  (4.25)  give 


«(xn’  x,  1)  >x- 


L(  1  +  (L)1/(-Dd(3-v)(v-1))3 


lxn  -  51 


and  since  ||xn  -  c||  ■+  0  one  can  conclude  that  for  any  fixed  6  G  (o,  ^0  , 
there  is  a  sufficiently  large  N(6)  such  that 


6l 


g(xn*  xn*  ^  —  5  for  n  —  N(5)* 

If  (l.ll)  holds  with  v  *  1  then  xn  *  £  for  n  >_  N  for  some  N  <  «  by 
Theorem  3.4. 

QED 

Proof  of  Lemma  4.3.  Let  B  (£)  =  (x  e  B_(£):  <  f"(x)(x  -  £)»  x  -  £>  >  0>. 

-  p  p 

Then  since  <f'(x),  x  -  £>  >_  f(x)  -  f(5)  when  f  is  convex,  it  is  sufficient 
to  show  that 

< ru\u‘-Mfh i>>- °  >  °  ''61"V!) 

for  some  p  >  0 

One  can  expand  f(x)  -  f(£) ,  <  f' (x) ,  x  -  £>  and  <  f"(x)(x  -  £) ,  X  -  5>  with 
Tfiylor'  s  formula  and  use  the  Lipschitz  continuity  of  f^  to  obtain 

(4.29)  f(x)  -  f(5)  =  (  f*U),  x  -  O 

+  l  jj-<  f(n)(5)(x  -  O”'1,  x  -  C>  +  x), 

n=2 

(4.30)  < f*(x),  x  -  O  =  <  f’U),  x  -  5> 

+  l  <  r(n) ( g)(x  -  e)11"1,  X  -  g>  +  r2(g,  x). 


X  -  O 


(U.31)  <  f"(x)(x  -  c),  x  -  O  ■  l  <  f(n)(S)(x  -  O**"1, 

n*2 ' 

+  r3U,  x), 

where  f^(C)  is  the  k—  differential  of  f  at  £  (for  a  discussion  of 
higher  order  derivatives  and  Taylor's  formula  in  Banach  spaces,  see,  e.g., 
Vainberg  [27]).  The  terms  r^(5,  x),  r2(5»  x)  and  x)  satisfy 

(4.32)  |ri(C,  x)|  =  |^-<(f(5)U  +  eL(x  -  O)  -  f(5)U))(x  -  0U,  x  -  V\ 

i  jif  llx  -  elf , 


(4.33) 

|r2u,  x)l  i irH*  ■  » 

(4.34) 

|r3(C,  x)|  <.  jj-  tlx  -  C|f . 

Each  X  e  Q  can  be  expressed  as  x  =  5  +  tu  where  u  is  a  unit  vector  and  t 
is  a  scalar  parameter.  Therefore,  (x  -  5)  *  tu  and  instead  of  (4.30),  for 
example,  one  cam  write 


(4.35)  <  f'(x),  x  -  V  -  t  <  f'(5),  u> 


5  ,n 

nkT^TTr 


f(n)(5)i”-1, 


“>  *ir r* 


which  is  valid  for  all  pairings  (t,  u)  satisfying  £  +  tu  6  f2.  Using 

the  notation  a  (u)  =  <  f^n^(5)un_1,  u>  ,  for  n  =  1,  ...,  5,  one  obtains 

n  n  • 

from  (1.11),  (4.35)  and  the  convexity  of  f  on  ft 


i 
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(4.36)  yt5  <  f(x)  -  f(5)  <_<  f'(x),  x  -  C> 

<  I  na  (u)t  +  Tjy  t  . 
n=l 

Consequently,  for  C  +  tu  €  q  and  for  t  sufficiently  small,  say  t  <  t,  one 
has 

(4.37)  l  na  (u)tn  >.  J  t5  . 
n=l 


Also,  in  the  simplified  notation,  (4.31)  and  (4.34)  become 

(4.38)  <f"(x)(x  -  5),  x  -  C>  <  \  n(n  -  l)a  (u)tn  +  ~-  t6  , 

n=2  n  J • 

and  for  x  €  f!  n  B?(C),  (4.28)  has  the  lower  bound 
t 


(4.39) 


<  f’(x),  x  -  C>  „  _ 
(  f"(x)(x  -  O,  x  -  V  -  5 


l  nan(a)tn  -£t6 

n=l 


>  0 


l  n(n-l)a  (u)tn  +  t6 
n=2  n  5' 


in  view  of  (4.36)  and  (4.37).  Furthermore,  using  (4.37)  one  obtains  for 
x  e  Q  n  B£  ( 5 ) 


L  t6 

UT  t 

-Lt6 

3!  1 

5 

1  nan ( u ) t 
n=l 

** 

l  na  (u)t 
n=l 

<  c. t ,  with  c,  =  —  , 
—  I  j.  3y 


and  (4.39)  then  gives 


<  f’(x).  x  -  S>  1  ~  Clt 

„  » 

<  f"(x)(x  -  £)>  x  -  C>  A( t ,  u)  +  c^t 


(4.40) 
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where 

5 

l  n(n  -  l)an(u)tn 

(4.41)  A(t,  Ci)  =  - 

l  na  (u)tn 

_ i 


If  it  can  be  shown  that  A(t,  u)  <_  for  some  constant  c^  < 
all  (t,  u)  pairs  satisfying  5  +  tu  £  B  n  B-(£),  then  (4.28) 

w 

from  (4.40)  since 


(4.42) 


<  f  (x) ,  x  -  S> 

<  f"(x)(x  -  5),  x  -  O  - 


1  - 


clt 


C2  *  'l*  - 


>  c  >  0 


for  t  sufficiently  small.  Note  that  since  5  is  an  extremal 
therefore,  one  can  write 


(4.43)  A(t,  u)  <_ 


4a1(u)t  +  l  n(n  -  l)a  (u)tn 

1  n=2 _ n 

£  nan(a)tn 
n=l  n 


=  4  - 


6a£(u)t^  +  6a^(u)t^  +  4a^(u)tit 


l  na  (u)tn 


n=l 


It  it  can  be  established  that 

(4.44)  6a.^i\L)t^  +  Sa^uK^  +  4aj+(u)til  >_  -c^t^ 

for  some  c_  >  0,  and  for  (t,  u)  satisfying  £  +  tu  €  Q  O  B-( 

■J  w 

t  <_  t,  then  (4.43),  (4.44),  and  (4.37)  would  yield 


*>  and  for 
will  follow 


a1(u)  >_  0; 


)  with 


A(t,  u)  U  +  -  for  (t,  u)  satisfying  £  +  tu  €  ft  n  B-(g). 

Y 

To  show  that  (4.44)  is  true  note  that  convexity  of  f  implies  that 
0  <.<  f"(x)(x  -  C),  x  -  V 

<_  l  n(n  -  l)a  (u)tn  +  —  t^. 
n=2  n 

Also,  ||a^(u)  ||  £  |jf^(£)  ||  <  <*>.  Therefore, 

(4.45)  2a2(u)t2  +  6a2(u)t^  +  12a^(u)t^ 

—  K  II  +  yj"  t  )t^  -c5t^ »  for  some  e5  i  °i 

for  t  sufficiently  small,  say  t  e  (0,  t).  Writing  (4.45)  as 

(it. 46)  6a2(u)  +  l8a^(u)t  +  36ai((u)t2  >_~3c,-t3, 

and  making  the  change  of  variables  x  =  3+  yields 

-c  x3 

(4.47)  6a2(u)  +  6a^(u)x  +  4a^(u)x2  _>  — | —  , 

which  is  valid  for  x  €  (0,  3t).  This  proves  that  (4.44)  is  true  at 

c.  i 

least  for  t  e  (0,  t)  with  c^  =  and  (4.28)  follows  for  p  =  min(t,  t,  yy ) . 

QED 

Remark  4.2.  For  certain  extremals  encountered  in  optimal  control  theory 
the  exponent  v  in  (1,11)  can  be  calculated;  this  was  discussed  in 
Remark  3.2.  Let  U  S  lRm  be  the  unit  ball,  and  as  in  Remark  3.2,  let  the 


constraint  set  W  be  the  set  of  functions 
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Q  =  {measurable  u(‘):[0,  l]  -*■  U}  C  (.^([O,  l],ZRm). 

If  £(t)  is  piecewise  continuous  with  range  on  the  boundary  of  U,  and  if 
an  associated  switching  function  grows  fast  enough  near  its  zeros,  then 
it  can  be  shown  (Dunn  [l4])  that  (l.ll)  (or  (3.50))  holds  for  any  v  in 
the  range  2  <  v  <_  *.  For  the  conditional  gradient  method  (M^  *  0  in  the 
(GS)),  linear  convergence  is  not  guaranteed  by  Theorem  3.3,  and  computer 
simulations  suggest  sublinear  convergence  for  a  simple  example  with  such 
a  minimizer.  On  the  other  hand.  Theorem  4.1  and  Lemma  4.3  prove  superlinear 
convergence  for  Newton's  method  in  this  setting  when  f  satisfies  the 
hypotheses  of  Lemma  4.3  (see  Remark  4.1  also). 
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5.  Convergence  Rates  for  Nonconvex  Functionals 

In  Theorem  2.1  it  was  shown  that  limit  points  of  a  sequence 
generated  by  the  (GS)  are  extremals.  This  is  true  for  differentiable, 
possibly  nonconvex  functionals  f  with  Lipschitz  continuous  derivatives 
f ' .  However,  convergence  rate  theorems,  presented  in  this  thesis  so 
far,  have  been  limited  to  convex  functionals;  the  proofs  of  these  theorems 
have  depended  heavily  on  the  convexity  property 

(5.1)  <  f»(x),  x  -  y>  >_  f(x)  -  f(y),  Vx,  y  e  n, 

although  it  was  indicated  in  Remarks  2.h  and  3.3  that  such  theorems  could 
be  extended  to  a  subclass  of  functionals  which  satisfy  the  weaker  pseudo¬ 
convexity  condition 

(5.2)  <f'(x),  x  -  £>  >_  <(f(x)  -  f(O) ,  for  some  k  >  0, 

where  E,  €  n  .  In  particular,  linear  convergence  occurs  for  certain 
methods  in  the  (GS)  when  (5.1)  or  (5.2)  holds  with  (1.10)  or  (l.ll)  for 

v  =  2.  It  will  be  shown  in  this  chapter  that  conditions  (5.2)  and  (1.10) 

with  v  =  2  hold  near  an  extremal  of  a  (possibly  nonconvex)  functional  f 
if,  for  some  p  >  0,  f  satisfies 

(5.3)  Q(f"U),  5,  x)  >_  y\\x  -  C|f  Vx  €  K^U)  n  B  (?) . 

Here  K^( £)  is  the  tangent  cone  to  H  at  E,  with  vertex  at  £  (see  Chapter  l) . 

If  the  operator  sequence  in  the  (GS)  satisfies  (1.13)  or  if  the  structure 
of  the  set  0  near  ?  is  such  that  (l.ll)  holds  with  v  =  2,  i.e.. 
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(5.U)  <  r(£),  x  -  5>  i  y|(x  -  ell2,  vx  S  Q,  and  some  y  >  0, 

then  Theorem  5.1  shows  that  if  (xn)  passes  sufficiently  near  £,  it  will 
converge  to  £  at  a  linear  rate.  When  {M^}  satisfies  certain  "quasi-Newton" 
conditions,  (5.3)  need  hold  only  for  x  e  Q  n  Bp(£)  for  some  p  >  0  to 
insure  linear  or  superlinear  convergence  rates;  this  will  be  established 
in  Theorem  5.2. 

Lemma  5.1.  Let  S  C  X  be  convex  where  X  is  a  Banach  space.  Let  f  be 

twice  differentiable  with  f"  continuous  at  £  and  let  f  satisfy  (5.3)  at 

£  for  x  €  s  n  B  (£)  and  some  p  >  0.  Then  for  some  p,  >  0 
P  1 

(5.5)  f(x)  -  f(£)  >_  J  ||  x  -  elf  Vx€S~Bp(£) 

Proof.  Using  Taylor's  formula  for  x  G  S  fl  b  (?)  at  £  one  has 

f(x)  -  f(£)  =  <  f'(£),  X  -  £>  +  |<  f"(c)(x  -  £),  x  -  £> 

=  Q(f"(£),  5,  x)  +  |<(f"(?)  -  f"(£))(x  -  £),  x  -  £ > 

>  (y  -  i||f"(c)  -  f"(£)||)||x  -  £| I2 

for  £  between  x  and  £.  By  the  continuity  of  f"  there  exists  a  p^  such 

that  for  || x  -  £||  <  p  ,  ||f"(^)  -  f"(£)||  <  y;  (5.5)  now  follows  for 

x  6  S  n  b  (£) . 

P1 

QED 


Remark  5.1.  The  proof  of  Lemma  5.1  is  essentially  the  same  as  the  proof 
of  Lemma  2.k  in  [22], 
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Lemma  5.2.  Let  i!  C  X  be  convex  where  X  is  a  Banach  space.  Let  f  be 
twice  differentiable  and  let  f"  be  Lipschitz  continuous  for 
x€  Kg(£)  '"i  Bp(0  for  some  p  >  0.  Let  f  satisfy  (5.3)  for 
x  e  n  B  ( C )  -  Then  for  some  Pg  >  0  and  k  >  0 

(5.6)  <  f'(x),  x  -  £>  >  k(f(x)  -  f(c))»  for  x  6  L(0  n  B  (O 

~  w  Pg 

Proof.  Since  K^( 5)  is  convex  it  follows  from  Lemma  5.1  that 

fix)  -  f(c)  >  0  for  x  6  K-.  (£)  O  B  (£)  for  some  p..  <  p.  It  must  be 

“  p^  -L 

shown,  therefore,  that  for  some  p2  <  p^  and  some  k  >  0,  and  for  x  4-  £ 


(5-7)  4raHrrr-ik 

From  Taylor's  formula  and  the  Lipschitz  continuity  of  f",  one  can  write 
for  x  ¥  €  and  x  €  K _(£)  r>  B  (?) 


(5.8) 


f(x)  -  f(p 
q(f"U),  5,  x) 

<  f'U),  x  -  ?>  +  |<f"U)(x  -  C),  x  -  C>  +  cjlx  -  e|p 
Qlf'(e),  5.  x) 


>  i  +  Cgiix  -  ell* 


where 


c 


2 
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Similarly,  one  obtains 
/-  ^  <  f*(x),  x  -  O 

(5*9)  Q(f”Ul,  CTTT 


<  f'U),  X  -  v  +  <  f"U)(x  -  e),  X  -  c>  -  cjx  -  dh 
•  Q(f"(C ),""«,  x) 


f-<  f"(C)(x  -  C),  x  -  C> 

i1  +  — qTFTcTTTT^ - c2^  *  ct'- 


Let  x  =  £  +  tu  for  u  a  unit  vector  in  K^(£),  3X1(1  let  R(t»  u)  be  <ief“ine(1 
for  t  €  (0,  p]  by 

|-t2<f”(C)u,  u> 

(5.10)  R(t,  u)  =  - i — p -  • 

t<  f(c),  u>  +  f  t^<f"(c)a,  a> 

Whenever  <  f"U)u,  u>  >_  0,  then  R(t,  u)  >_  0  and  (5.9)  yields 


(5.11) 


<  f'(x),  x  —  V 
Q(f"U),«.  x) 


>  1 


t. 


On  the  other  hand  for  any  u  for  which  <  f"(£)u,  u>  <  0,  it  must  be  shown 
that  |R(t,  u) |  <  c^t  for  some  <  00 .  If  this  is  so,  then  from  (5.9)  one 
has 


(5'l2>  gffikx  If  -  1  -  |B<t’  a>l  -  C2‘  i1  -  (c3  *  C2H- 

and  for  t  sufficiently  small  (5-7)  will  follow  from  (5.8),  (5.1l),  and 
(5.12).  For  u  e  K  (5)  and  for  t  6  [o,  p],  (5.3)  gives 

t<  f '  ( ?) ,  u>  +  t2<  f"(  c)u,  u>  >_  yt2. 
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In  particular,  for  t  =  p  and  <  f"(C)u,  u>  <  0,  one  has 


<  f’(e),  u>  >  (y  -  |<f"(5)u,  u>)p  >  0 


in  which  case,  one  can  write  for  t  €  [o,  p] 


t<  f»(5),  u>  +  |  t2<  f”(C)u,  u > 

^  typ  -  4  t(p  -  t)<  f"(c)u,  u >  _>  t(yp) , 


Consequently,  for  t  £  (0,  p]  and  <  f"(c)u,  u>  <  0,  (5.10)  yields 


iR(t  u)  i  -<  <  limiiiii  t  =  e  t 

|mt'  U;|  -  2typ  -  2yp  X  °3 


QED 


Roughly  speaking.  Lemmas  5.1  and  5.2  show  that  the  functional  f  is 
"locally  pseudoconvex  with  respect  to  £"  when  condition  (5.3)  holds. 

Theorem  5.1.  Let  Q  C  X  he  convex  where  X  is  a  Banach  space.  Let  f  be 
twice  differentiable  with  f"  Lipschitz  continuous  for  x  €  K 0(C)  O  B  (g). 
Let  f  satisfy  (5.3)  for  x  6  K^(5)  o  B^(C).  Let  either  (1.13)  be  satisfied 
by  the  operator  sequence  (M^ } ,  or  the  structure  of  the  set  near  £  be  such 
that  (5.*0  holds.  If  (xn),  a  sequence  generated  by  the  (GS),  comes 
sufficiently  near  then  xn  -*■  5  and  rn  =  f(xR)  -  f(^)  =  0(\n),  for  some 
A  €  (0,  1). 

Proof.  By  Lemmas  5.1  and  5.2  and  condition  (5.3)  there  is  a  such  that 
(5.13)  <.f(x),  x  -  V  ^k(f(x)  -  f(O)  l^||x  -  5|f 


for  x  e  il  n  B  (O  c  K-(5)  0  3  (£). 

Px  n  Pjl 
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Furthermore  if  is  a  Lipschitz  constant  for  f* ,  and  if  satisfies 

(1.13),  then  (3.32)  in  Lemma  3.1  becomes 

(5.i»o  |  w|^n  -  dl2  +  <f'(c),  xn  -  ?> 

<  (Lx  +  K)||xn  -  C||||in  -  ClI.  Vn  >  0, 

where  K  is  a  uniform  bound  on  II M  ||.  On  the  other  hand,  if  (5.M  holds 

n 

then  (3.32)  becomes 

(5.15)  -  «).  ^  +  Yl&n  -  ell2 

<  (Lx  +  K)||*n  -  E||||xn  -  e||,  Vn>0. 

Since  f(x)  -  f(C)  >  0  in  a  small  neighborhood  around  ?,  it  follows  that 
E  is  an  extremal,  i.e. ,  <  f ' (?) ,  x  -  ?>  ^  0,  Vx6  !!,  hence  (5.lU)  yields 


(5.16) 


icQ  *  ell  1  cllxn  -  e||  Vn  >_  0,  and  c  =  2(L]L  +  K)/u 


Similarly  the  nonnegativity  of  in  (5.15)  gives  (5.16)  with 

~  Pi 

c  ■  (L^  +  K)/y.  In  either  case,  when  0  <  ||xn  -  ?||  <  =  —  ,  then 

Q 

x„  e  n  n  B  (?).  Let  A  =  {x  e  n  n  B  (?):f(x)  -  f(?)  <  £  p_}.  Then 

n  P-j_  P^_  2  2 

(5.13)  implies  that  A  C  n  n  B  (?).  Since  f  is  continuous  there  is  a 

p2 

p0  >  0,  such  that  ifxSQOB  (?)  then  x  £  A.  It  follows  then  that  if 
3  p3 

X,-  €  Q  n  B  (?)  for  some  N  >  0,  then  x„  S  n  n  B  (?)  for  n  >  N.  This  is 
«  p^  n  p^  — 

true  because  Xjj  £  n  n  B^  (?),  and  since  x^+1  is  a  convex  combination  of 

Xjj  and  x^,  then  x^+1  £  n  n  Bp  (?).  3ut  from  Theorem  2.1  line  (2.25)  and 

Lemma  5.1,  f(?)  <_  f(xJJ+1)  <  f(x^)  and,  therefore,  xJJ+1  £  A  C  jj  n  b^  (?). 

By  an  induction  argument  x  £  n  B  (?)  for  n  >_  N.  Also,  since 

**  P  2 

<f’(xn),  xn  -  ?)  _>_  1 1  xr  -  ?||2  for  n  >  N  it  follows  that  ?  €  '{’(x^),  and 

Lemma  2.2  and  (5.13)  yield 
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<f’(xn>’  xn“ 


(5.17)  <f'(x  ),  x  -  x  >  > 

n  ’  n  n  — 


if  <  M  (x  -  0.  x  -0-0 
n  n  *  *  n 


<  f'(x  ),  x  -  O 

j  min{<  f'(x  ),  x  -  Ot  - 3 - K 

2  "  °  K||*„  -  if 


if  (  M  ( x  -  0,  *  -  O  >  0 

n  n  n 


>|«in{(f(*a)  -  f(0),  |J(f(xn)  -  f(c))> 


>  cu(f(xn)  -  f(0) 


where  =  ||mind,  >  0.  From  (5.16),  (5.17).  (2.23)  and  the 
triangle  inequality  one  now  has 


26ci  ( f (x  )  -  f(0) 

(5.18)  u)  >.  min{l,  - 7 — 5 - } 


Lll'Xn  *  xn 


If 


>_  min{l. 


L^l  ♦  c)2||*n  -  ||f 


3-}  =  ID  >  0, 


where  is  a  Lipschitz  constant  for  f ' .  Finally,  (2.25)  gives 

f(x  )  -  t(0  -  f(x  )  +  f(0  >  fidJCi  (f(x  )  -  f(O) 
n  n+1  —  4  n 


or  equivalently 


Vl  -  (1  -  6uc4,rn 


7fc 


and  r  =  0((l  -  6cjc.  )n)  =  0(An).  The  linear  convergence  of  {r  },  together 
n  4  n 

with  (5.13)  implies  that  {||xn  -  £||}  converges  to  zero  at  a  linear  rate. 

QED 

The  hypotheses  of  Theorem  5*1  can  be  weakened  somewhat  when  the 

operators  Mn  are  so  called  "quasi-Newton"  operators.  If  the  Mn's  are 

symmetric  and  approach  the  second  derivative  operator  f"(c)  in  the  sense 

of  (1.18)  -  (1.21)  then  (5.3)  need  hold  only  for  x  €  Q  n  B  (£)  and  f" 

P 

need  only  be  continuous  at  £  to  establish  linear  and  superlinear  rates 
of  convergence  near  £. 

Theorem  $.2.  Let  C  C  X  be  convex  where  X  is  a  Banach  space.  Let  f  be 

twice  differentiable  with  f"  continuous.  Let  f  satisfy  (5.3)  for 

x  S  n  n  B  (5)  for  some  p  >  0  and  £  C  Q.  Let  {M  }  be  a  sequence  of 
P  n 

symmetric  operators  in  the  (GS),  i.e., 

(Mx,y>=<My,  x>,  Vx,  y  E  X,  n>0. 

Then: 

(i)  If  (l.l8)  or  (1.19)  holds  with  e  sufficiently  small  and  n  >_  N, 

and  if  {xn>  is  a  sequence  generated  by  the  (GS),  there  exists  a  p^  >  0 

such  that  ifx  €  ft  n  B  (?)  for  some  n  >  N,  then  {I  lx  -  £||}  converges 
n  p,  o  —  "  n 

o  1 

to  zero  at  a  linear  rate,  i.e.,  for  some  A  6  (0,  l) 

1 1  xn+i  “  *11  1  M!xn  -  £||,  for  n  >_  nQ. 
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(ii)  If  either  (1.20)  or  (l.2l)  holds  then  {||xn  -  C||}  converges 
superlinearly,  i.e.,  either  xq  =  i  eventually,  or 


lim 

n-x» 


lxn+l  "  5 II 

K  -  «ir 


Proof  of  (1).  It  will  first  be  shown  that  if  x  is  sufficiently  close 

—  -  — — -  n 

o 

to  £  with  n  >  N,  then  llx  -  c||  <  ctlx  -  c||  for  some  c  €  (o,  1),  and, 
o  ' —  n  —  n 

0  ° 

by  induction,  that  ||xn  -  c||  £  clixn  ”  dl  for  n  ^  nQ.  Then  it  will  be 

established  that  oi  >  <o  >  0  for  n  >  n  and  that  this  implies  that 
n  —  —  o 

||xn+1  -  £||  <_  l||xn  -  S||  for  some  X  €  (0,  l)  and  for  n  >_  nQ.  To  show  that 

llx  -  d!  <  cllx  -  d|  for  some  c  e  (0,  l)  and  n  >  n  ,  let  x  satisfy  for 
"  n  —  n  —  o  n 

any  n  >_  0 

(5.19)  0  1  Q(Mn,  xn,  O  -  Q(Mn,  xn,  xj. 

Then  inequality  (3.1)  in  Lemma  3.1  holds  with  replaced  by  xn>  and  with 
the  symmetry  of  Mn  one  can  write 

i<Mn(xn  -  O,  xn  -  O  +  <  f’U),  xn  -  O 

£ <  f'(C)  -  f'(xn),  xn  -  C>  +  <Mn(xn  -  5),  xn  -  £>. 

With  the  Mean  Value  Theorem,  this  implies 


~<  (Mn  -  f"(5))(xn  -  O,  xn  -  V  +  Q(f"U),  5,  xn) 
<<(Mn  -  f"(Cn))(xn  -  0.  xn  - 

for  cn  =  xn  +  en(s  -  xn),  en  e  [o,  i]. 


Consequently,  for  xq  e  fi  n  Bp(c),  condition  (5.3)  and  the  triangle 
inequality  yield  for  xn  ^  C» 


(5.20)  (y  - 


||  (M  -  f"(0)(x  -  0||  « 

•1-a — - 2 - -HR  -  elf 


K  -  5 II 


||  (M  -  f "  ( C ) )  ( x  -  5)  ||  _ 

<  ( — - - - +  ||f"(€)  -  f’!(cn)||)llxn  -  e||  ||x 

!|xn-e|| 


(5.21)  (y  -  ||Mn  -  f"(?)||)||xn  -  Cll2 

<  (||Mn  -  f'(c)  II  +  II f"(e)  -  f"(cn)||)||xn  -  c||  ||xn  -  cll 


Suppose  (1.19)  holds.  Then  since  f"  is  continuous,  there  is  a  £  (0, 

such  that  if  x  6  H  n  B  (5)  and  x  £  0  n  B  (e)  for  n  >  N,  then  (5.20) 
n  p  n  — 

and  (1.19)  with  e  sufficiently  small  imply 


(5.22)  ||xn  -  Cl]  1  c||xn  -  C||, 


where 


e  +  ||  f"(C)  -  f"(C. 


Y  -  e 


-)  <  1. 


Let  x  e  ft  n  B  (5)  for  some  n  >_  N.  Suppose  that  ||x  -  c||  >  P»  and 

no  ^1  o 

-5) 

let  x  =  f  +  0  - - -  •  Since  Q(M  ,  x_  ,  • )  is  convex  and  k 

n  11  *  _  11  n  n  n 

o  ||xn  -  Cll  OO 

o 

minimizes  Q(M  ,  x  ,  •)  on  8,  one  has 
n  n 
o  o 


I 


(5.23)  Q(Mn  ,  xn  ,  xn  )  <  Q(Mn  >  \  \  )  L  Q(«n  .  *n  *  *), 


o  o  o 


o  o  o 


Therefore,  x  satisfies  (5.19),  and  since  ||x  -  C||  =  P,  (5.22)  holds. 


But  (5.22)  gives  ||x  -  C||  lcllxn  -  S||  <  P-j^  <  P,  and  this  contradiction 

no  o 

proves  that  ||xn  -  c||  <,  p.  From  (5.23),  (5.19),  and  (5-22)  there  results 


l|xn  -5||<.e||x  -  Cl|, 

o  o 


for  some  c  €  (o,  1) . 


It  follows  by  induction  that  for  n>n,  x  €ftf)B  (£)  and 
•'  —  o  n  p. 


(5.2U) 


|xn  -  S|llc||xn  -  5 1|. 


If  (1.18)  holds  then  (5.24)  follows  by  the  same  argument.  To  prove  that 


(||xn  -  C||}  converges  at  a  linear  rate  it  suffices  to  show  that  >_  a>  >  0 
since,  one  can  then  write  for  n  >_  nQ,  x^  f  5, 


(5.25) 


IK+1  -  «l 


lxn  +  u>n(xn  ‘  C  €  ~  V  “ 


lxn  -  SI 


(1  -  «n)Hxn  -  Sll  +  »nl£n  -  C| 


x  -  5 
1  n  1 


<  (1  -  u>  )  +  u  c 
—  n  n 


=  1  -  0)  ( 1  -  c ) 
n 


i  -  aid  -  c)  <  i. 
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To  prove  that  w  >  o>  >  1  observe  that  for  x  £  II  n  B  (?) 

n  —  n  p 

(5.26)  -Q(M  ,  x  ,  x  )  >  -Q(M ,  x  ,  O 

n  n  n  —  n  n 

=  Q(f'U),  C,  xn)  -  Q(f"(0.  S,  xn)  -  Q(Mn,  xn,  C) 

>  yII^  -  ell2  -  «r(z)  -  f'UJ,  xn  -  o 

*  |<f"(«)(xn  -  5),  Xn  -  E>  *  |<Mn(xn  -  e).  *„  -  {»■ 

The  Mean  Value  Theorem,  the  triangle  inequality,  and  (5.26)  give  for  xr  ^  § 

(5.27)  <«"<*„>,  *„  -  i„>  -  Vxn  -  i„>.  *„  -  \> 

1  y|(xn  -  ell2  +  <  f"C5n)(xn  -  s),  xn  -  o 

-i<f"(C)(xn-C),xn-5)-i<Mn(xn-0,xn-5> 

1  [Y  -  |||f"(cn)  -  f"(OH 


!  ||  (f"(c  )  -  f"(5))(xn  -  0||  +  ||(f"(0  -  Mn)(xn  -  OH 
-  5< - p-Tj], - )  •  ll*n 


MK  -  s  ll2 


i  (y  -  ||f"Un)  -  f"(0||  - 


|l|(f"(C)  -  Mn)(xn  -  OH 

PT^T! 


UK,  -  til2. 


vhere  -  x„  +  9n(E  -  *„),  6„  €  [0,  l] , 


>  (y  -  l|f"Un)  -  f"(OI!  -  |l|f”(0  -  Mn||)||xn  -  ell2. 


(5.28) 
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Suppose  (1.19)  holds  with  e  small  enough.  Once  again,  since  f”  is 
continuous,  there  is  a  €  (0»  P]_)  such  that  if  xr  €  q  n  B^  (?)  for 
n  _>  N  then  (5.27)  yields 

(5.29)  <  f'(xn),  \  -  xn>  1  7Hxn  -  C II2 ,  for  y  >  0. 

From  (2.23),  (5.22),  (5.29)  and  the  triangle  inequality  one  has,  for 

x  #  £  and  n  >  n  >  N, 
n  —  o  — 


25Y|fxn  -elf 

(5.30)  w_  >  min{l,  - s-} 

UK  -  kf 


>  min{l, 


26y||xn  -  g|f 

L(|lxn  -  e||  +  ||xn  -  e| 


_>  mind, 


L(  1  *■  c )‘ 


-}  =  u)  >  0. 


Finally,  if  xr  €  n  n  b  (c)  for  some  nQ  >_  N,  then  from  (5.25)  and  (5.22) 
o  ^2 

it  follows  that  {||xn  -  e||)  converges  to  zero  at  a  linear  rate.  The  same 
result  can  be  established  when  (l.l8)  holds  using  (5.28)  and  the  same 
argument . 


Proof  of  ( ii ) .  Condition  (1.20)  implies  condition  (1.18);  therefore, 
the  results  of  (i),  (l .20),  (5*21)  and  the  continuity  of  f"  yield 


(5.31)  ||xn  -  e||  1  Xnl|xn  -  ell,  where 


AD- A 106  715  AIR  FORCE  INST  OF  TECH  VRI6HT-PATTERS0N  AFB  OH  F/G  12/1 

CONVERGENCE  RATE  ANALYSIS  FOR  ITERATIVE  MINIMIZATION  SCHEMES  «1~ETC(U) 
1901  0  C  HUGHES 

UNCLASSIFIED  AFIT-CI-fll-170  NL 
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Also*  (5.31)  follows  from  the  results  of  (i)»  (l.2l),  (5.20)  and  the 
continuity  of  f".  One  need  only  show  that  after  a  finite  number  of 
iterations  »  1  for  the  remaining  iterations.  Fix  6  €  (0,  ^).  Then 
from  (4.22)  one  has,  for  xq  not  an  extremal. 


(5.32)  g(xft,  xq,  1)  ■  1  - 


|<fl,(i;n)(xn  -  xn),  xn  -  xp> 
(f{xn),  *n  -  in> 


>  1  - 


<MnUn  -  V-  \  -  V 

*n  '  V 


li(r(tn)  -  Mn)(ig  -  x„)||IU„  -  «„ll 

^  ^xn^*  xn  ”  xn* 


From  the  triangle  inequality,  (5.32)  and  (4.24),  which  is  valid  for 
symmetric  operators  there  results 


(5.33)  g(xn’  xn»  ^  L~  - 


3.  ||f"(cn)  -f’HOIIII^-xJI2 


2<f'(xn),  xn  -  xn> 


L 


|j(f"(C)  -  M  )(x  -  0}\ 

-  -iixp---r— 


||(f’(Q  -  Mn)Un 
Hxn  -  ell 


«nir 

- i'xn 


-  E  k.  - 


x  jj 
n" 


Also,  (5.22),  (5.29)  and  the  triangle  inequality  yield  for  x  €  Q  n  B  (£), 

Pg 

2 

(5.3k)  8<V  *n*  11  3.  f-  U  !  C>-||f”(^,)  -  f*(t)|| 

2  Y 

(1  „  C)  II  WO  -  Mn)lxn  -  ;)|| 

2;  ||xn  -  e|| 

c(l  ♦  c)  -  wn)(in  -  £11 

2*  ll*n  -  5| 


or 


2 

(5.35)  g(xn,  £n,  1)  >  |  -  (l-t  ||f"(r  )  -  f"(  £ )  || 

2y 

-  ^4 --11^(0  -  m  || 

2Y  n 

-  —  *  ^  ||f”(C)  -  M  ||. 

2y 

Finally,  the  continuity  of  f"  and  (1.20)  in  (5.35)  or  (l.2l)  in  (5.3U) 
give  g(xQ,  xQ,  1)  >  i  for  n  >  N1(S)  with  1^(6)  <  ». 

QED 


Remark  5.2.  The  proof  of  Theorem  5.2  is  a  modification  of  the  proofs 
in  [22]  for  Newton’s  method. 
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